[jira] [Created] (FLINK-2155) Add an additional checkstyle validation for illegal imports
Lokesh Rajaram created FLINK-2155: - Summary: Add an additional checkstyle validation for illegal imports Key: FLINK-2155 URL: https://issues.apache.org/jira/browse/FLINK-2155 Project: Flink Issue Type: Improvement Reporter: Lokesh Rajaram Assignee: Lokesh Rajaram Add an additional check-style validation for illegal imports. To begin with the following two package import are marked as illegal: 1. org.apache.commons.lang3.Validate 2. org.apache.flink.shaded.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571638#comment-14571638 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108615858 I thought about what @StephanEwen said about uncheckpointed sources also having the locking object in the signature of the run() method and also about extensibility. We might have to tweak the source interface a little bit more. What I propose is to have this run method: ``` void run(SourceContext context); ``` Then the source context would have methods to retrieve the locking object (for checkpointed sources) and for emitting elements. Part of my motivation for this is that this can be extended in the future without breaking existing sources. If we introduce proper timestamps at some point we can extend the SourceContext with a method for emitting elements with a timestamp. Then, if we want to have watermarks the context can have methods for activating automatically generated watermarks and for emitting watermarks. And so on... I think we should fix this now, before the release. What do you think? Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108615858 I thought about what @StephanEwen said about uncheckpointed sources also having the locking object in the signature of the run() method and also about extensibility. We might have to tweak the source interface a little bit more. What I propose is to have this run method: ``` void run(SourceContext context); ``` Then the source context would have methods to retrieve the locking object (for checkpointed sources) and for emitting elements. Part of my motivation for this is that this can be extended in the future without breaking existing sources. If we introduce proper timestamps at some point we can extend the SourceContext with a method for emitting elements with a timestamp. Then, if we want to have watermarks the context can have methods for activating automatically generated watermarks and for emitting watermarks. And so on... I think we should fix this now, before the release. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [wip] [FLINK-2136] Adding DataStream tests for...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/771#issuecomment-108573493 Please fix the scala style issues. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571489#comment-14571489 ] ASF GitHub Bot commented on FLINK-2069: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/759 writeAsCSV function in DataStream Scala API creates no file --- Key: FLINK-2069 URL: https://issues.apache.org/jira/browse/FLINK-2069 Project: Flink Issue Type: Bug Components: Streaming Reporter: Faye Beligianni Priority: Blocker Labels: Streaming Fix For: 0.9 When the {{writeAsCSV}} function is used in the DataStream Scala API, no file is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2069] Fix Scala CSV Output Format
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/759 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Remove execute() after print() in Tab...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/735 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2154) ActiveTriggerPolicy collects elements to a closed buffer after job finishes
Márton Balassi created FLINK-2154: - Summary: ActiveTriggerPolicy collects elements to a closed buffer after job finishes Key: FLINK-2154 URL: https://issues.apache.org/jira/browse/FLINK-2154 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi When gracefully finishing a time windowing job I have witnessed the following exceptions. The thread triggering the active policy tries collecting the data, even though the buffer pool has been already destroyed as the operator has finished. 16:06:44,419 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts) (1/4) (a28aa6212bd0c4eca271c133eb86a223) switched from RUNNING to FINISHED 16:06:44,417 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Unregistering task and sending final execution state FINISHED to JobManager for task CoFlatMap - Window Flatten (891e684ea2c1d5155b2363057035fca0) 16:06:44,419 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts)(1/4) switched to FINISHED 16:06:44,419 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts) (3/4) (2e22e0ba633eb1319adc9b48ed7ff477) switched from RUNNING to FINISHED 16:06:44,420 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 StreamDiscretizer - SlidingTimeGroupedPreReducer - (Filter, ExtractParts)(3/4) switched to FINISHED 16:06:44,420 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Unregistering task and sending final execution state FINISHED to JobManager for task CoFlatMap - Window Flatten (dadf928e00ab1b6f9685bf08e8d447d8) 16:06:44,421 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Unregistering task and sending final execution state FINISHED to JobManager for task CoFlatMap - Window Flatten (e211fc565e81fbdaeda0c20702c83fab) 16:06:44,421 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Unregistering task and sending final execution state FINISHED to JobManager for task CoFlatMap - Window Flatten (e92b325076707068b32780d30a3355a9) 16:06:44,422 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CoFlatMap - Window Flatten (2/4) (891e684ea2c1d5155b2363057035fca0) switched from RUNNING to FINISHED 16:06:44,422 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CoFlatMap - Window Flatten (1/4) (dadf928e00ab1b6f9685bf08e8d447d8) switched from RUNNING to FINISHED 16:06:44,422 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 CoFlatMap - Window Flatten(2/4) switched to FINISHED 16:06:44,423 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 CoFlatMap - Window Flatten(1/4) switched to FINISHED 16:06:44,423 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CoFlatMap - Window Flatten (3/4) (e211fc565e81fbdaeda0c20702c83fab) switched from RUNNING to FINISHED 16:06:44,424 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 CoFlatMap - Window Flatten(3/4) switched to FINISHED 16:06:44,424 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CoFlatMap - Window Flatten (4/4) (e92b325076707068b32780d30a3355a9) switched from RUNNING to FINISHED 16:06:44,424 INFO org.apache.flink.runtime.client.JobClient - 06/03/2015 16:06:44 CoFlatMap - Window Flatten(4/4) switched to FINISHED 16:06:49,151 ERROR org.apache.flink.streaming.api.collector.StreamOutput - Emit failed due to: java.lang.IllegalStateException: Buffer pool is destroyed. at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92) at org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:58) at org.apache.flink.streaming.api.collector.StreamOutput.collect(StreamOutput.java:62) at org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:40) at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:34) at org.apache.flink.streaming.runtime.tasks.OutputHandler$OperatorCollector.collect(OutputHandler.java:232) at org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:40) at
[jira] [Commented] (FLINK-2136) Test the streaming scala API
[ https://issues.apache.org/jira/browse/FLINK-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571484#comment-14571484 ] ASF GitHub Bot commented on FLINK-2136: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/771#issuecomment-108573493 Please fix the scala style issues. :) Test the streaming scala API Key: FLINK-2136 URL: https://issues.apache.org/jira/browse/FLINK-2136 Project: Flink Issue Type: Test Components: Scala API, Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann There are no test covering the streaming scala API. I would suggest to test whether the StreamGraph created by a certain operation looks as expected. Deeper layers and runtime should not be tested here, that is done in streaming-core. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports
[ https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571577#comment-14571577 ] Ufuk Celebi commented on FLINK-2155: Thanks! Are there more Validate versions we could exclude? I think lang3 is a commons version? Add an additional checkstyle validation for illegal imports --- Key: FLINK-2155 URL: https://issues.apache.org/jira/browse/FLINK-2155 Project: Flink Issue Type: Improvement Reporter: Lokesh Rajaram Assignee: Lokesh Rajaram Add an additional check-style validation for illegal imports. To begin with the following two package import are marked as illegal: 1. org.apache.commons.lang3.Validate 2. org.apache.flink.shaded.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108627084 Can you elaborate? Why are there backwards events after the connection is closed? The iteration head should not close until the iteration terminates, in which case there should be no back events any more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase
[ https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571716#comment-14571716 ] ASF GitHub Bot commented on FLINK-2134: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108627084 Can you elaborate? Why are there backwards events after the connection is closed? The iteration head should not close until the iteration terminates, in which case there should be no back events any more. Deadlock in SuccessAfterNetworkBuffersFailureITCase --- Key: FLINK-2134 URL: https://issues.apache.org/jira/browse/FLINK-2134 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ufuk Celebi I ran into the issue in a Travis run for a PR: https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase multiple times: {code} cluster = new ForkableFlinkMiniCluster(config, false); for (int i = 0; i 100; i++) { // run test programs CC, KMeans, CC } {code} The iteration tasks wait for superstep notifications like this: {code} Join (Join at runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() [0x000123f2a000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57) - locked 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. The system needs to be under some load for this to occur after multiple runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase
[ https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571748#comment-14571748 ] ASF GitHub Bot commented on FLINK-2134: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108630865 Makes sense. +1 to get this in! Deadlock in SuccessAfterNetworkBuffersFailureITCase --- Key: FLINK-2134 URL: https://issues.apache.org/jira/browse/FLINK-2134 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ufuk Celebi I ran into the issue in a Travis run for a PR: https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase multiple times: {code} cluster = new ForkableFlinkMiniCluster(config, false); for (int i = 0; i 100; i++) { // run test programs CC, KMeans, CC } {code} The iteration tasks wait for superstep notifications like this: {code} Join (Join at runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() [0x000123f2a000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57) - locked 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. The system needs to be under some load for this to occur after multiple runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108630865 Makes sense. +1 to get this in! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31677964 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/HITS.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; +import org.apache.flink.api.common.aggregators.LongSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.Hits; + +import java.io.Serializable; + + +/** + * + * This class implements the HITS algorithm by using flink Gelly API + *Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, + *developed by Jon Kleinberg. + * + * The algorithm performs a series of iterations, each consisting of two basic steps: + * + * Authority Update: Update each node's Authority score to be equal to the sum of the Hub Scores of each node that + * points to it. + * That is, a node is given a high authority score by being linked from pages that are recognized as Hubs for information. + * Hub Update: Update each node's Hub Score to be equal to the sum of the Authority Scores of each node that it + * points to. + * That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject. + * + * The Hub score and Authority score for a node is calculated with the following algorithm: + * *Start with each node having a hub score and authority score of 1. + * *Run the Authority Update Rule + * *Run the Hub Update Rule + * *Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and + * dividing each Authority score by square root of the sum of the squares of all Authority scores. + * *Repeat from the second step as necessary. + * + * http://en.wikipedia.org/wiki/HITS_algorithm --- End diff -- I would add an @see annotation for the link here, check the community detection library method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-1907) Scala Interactive Shell
[ https://issues.apache.org/jira/browse/FLINK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1907. --- Scala Interactive Shell --- Key: FLINK-1907 URL: https://issues.apache.org/jira/browse/FLINK-1907 Project: Flink Issue Type: New Feature Components: Scala API Reporter: Nikolaas Steenbergen Assignee: Nikolaas Steenbergen Priority: Minor Fix For: 0.9 Build an interactive Shell for the Scala api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1907) Scala Interactive Shell
[ https://issues.apache.org/jira/browse/FLINK-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1907. - Resolution: Implemented Fix Version/s: 0.9 Scala Interactive Shell --- Key: FLINK-1907 URL: https://issues.apache.org/jira/browse/FLINK-1907 Project: Flink Issue Type: New Feature Components: Scala API Reporter: Nikolaas Steenbergen Assignee: Nikolaas Steenbergen Priority: Minor Fix For: 0.9 Build an interactive Shell for the Scala api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571672#comment-14571672 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108619295 Yes, the change can basically be done by a regex so I also propose merging this as early as possible now. By the way, we could ensure that the source is actually holding the monitor lock with `Thread.holdsLock(obj)`. Not sure about the performance impact, though. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108619295 Yes, the change can basically be done by a regex so I also propose merging this as early as possible now. By the way, we could ensure that the source is actually holding the monitor lock with `Thread.holdsLock(obj)`. Not sure about the performance impact, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Remove extra HTML tags in TypeInformation Java...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/766#issuecomment-108619345 +1, fair to merge :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108618236 How about we still merge this now, to make sure we have a good version in to start testing? The change you propose is API only, and would not affect internals/timings... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571809#comment-14571809 ] ASF GitHub Bot commented on FLINK-1319: --- Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108640926 Hey Ufuk, thank you very much for reviewing my code and all others for the feedback! I tried to consider all your feedback (I hope I didn't forget anything). I did a large refactoring again, added some comments to important parts of the code and fixed some bugs. I also added some additional test cases. I hope the PR is now ready to be merged (if the build succeeds) :) Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2070) Confusing methods print() that print on client vs on TaskManager
[ https://issues.apache.org/jira/browse/FLINK-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2070. - Resolution: Fixed Assignee: Stephan Ewen Fixed via 11643c0cc79eabe02e952e6fbd56d7a55166b623 Confusing methods print() that print on client vs on TaskManager Key: FLINK-2070 URL: https://issues.apache.org/jira/browse/FLINK-2070 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9 With the {{print()}} method printing on the client, the {{print(sourceIentified)}} method becomes confusing, as it has the same name, but prints on the taskManager (into its out files) and executes lazily. We should clarify the confusion by picking a more descriptive name, like {{printOnTaskManager()}}. I am not sure how common the use case to print into the TaskManager {{out}} files is. We could remove that method and point to the {{PrintingOutputFormat}} for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2070) Confusing methods print() that print on client vs on TaskManager
[ https://issues.apache.org/jira/browse/FLINK-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2070. --- Confusing methods print() that print on client vs on TaskManager Key: FLINK-2070 URL: https://issues.apache.org/jira/browse/FLINK-2070 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9 With the {{print()}} method printing on the client, the {{print(sourceIentified)}} method becomes confusing, as it has the same name, but prints on the taskManager (into its out files) and executes lazily. We should clarify the confusion by picking a more descriptive name, like {{printOnTaskManager()}}. I am not sure how common the use case to print into the TaskManager {{out}} files is. We could remove that method and point to the {{PrintingOutputFormat}} for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2151) Provide interface to distinguish close() calls in error and regular cases
[ https://issues.apache.org/jira/browse/FLINK-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571704#comment-14571704 ] Stephan Ewen commented on FLINK-2151: - For rich functions, we can add a {{signalCancel()}} methods. That breaks nothing, because they are regular classes. For data sinks, it is more tricky. We can add a {{CancelableSink}} Provide interface to distinguish close() calls in error and regular cases - Key: FLINK-2151 URL: https://issues.apache.org/jira/browse/FLINK-2151 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.9 Reporter: Robert Metzger I was talking to somebody who is interested in contributing a {{flink-cassandra}} connector. The connector will create cassandra files locally (on the TaskManagers) and bulk-load them in the {{close()}} method. For the user functions it is currently not possible to find out whether the function is closed due to an error or an regular end. The simplest approach would be passing an additional argument (enum or boolean) into the close() method, indicating the type of closing. But that would break all existing code. Another approach would add an interface that has such an extended close method {{RichCloseFunction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase
[ https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571730#comment-14571730 ] ASF GitHub Bot commented on FLINK-2134: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108628742 No, there are no backwards events *after* the channel is closed. The sync sends out the backwards events, then closes. But the close could overtake the unflushed backwards events. This lead to a deadlock, because the head was waiting on termination events, which never arrived. Deadlock in SuccessAfterNetworkBuffersFailureITCase --- Key: FLINK-2134 URL: https://issues.apache.org/jira/browse/FLINK-2134 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ufuk Celebi I ran into the issue in a Travis run for a PR: https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase multiple times: {code} cluster = new ForkableFlinkMiniCluster(config, false); for (int i = 0; i 100; i++) { // run test programs CC, KMeans, CC } {code} The iteration tasks wait for superstep notifications like this: {code} Join (Join at runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) (8/6) daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() [0x000123f2a000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57) - locked 0x0007f89e3440 (a java.lang.Object) at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. The system needs to be under some load for this to occur after multiple runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108627853 I was thinking the same thing, about `Thread.holdsLock(obj)`. That call probably costs way more than the lock itself, though. Would be nice to have a debug mode, to activate it there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571725#comment-14571725 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108627853 I was thinking the same thing, about `Thread.holdsLock(obj)`. That call probably costs way more than the lock itself, though. Would be nice to have a debug mode, to activate it there. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108628742 No, there are no backwards events *after* the channel is closed. The sync sends out the backwards events, then closes. But the close could overtake the unflushed backwards events. This lead to a deadlock, because the head was waiting on termination events, which never arrived. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571743#comment-14571743 ] ASF GitHub Bot commented on FLINK-1319: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108630545 Great review, Ufuk. I agree with @uce and @rmetzger to add a comment how to disable it (in case). Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-108630545 Great review, Ufuk. I agree with @uce and @rmetzger to add a comment how to disable it (in case). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/773#issuecomment-108630661 Ah, it is a close from the receiver end, got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/765#issuecomment-108634942 Hi @mfahimazizi, The common practice is to prefix the commit with the Jira issue, e.g. [FLINK-20XX][gelly]. Also, you should play with your IDE's settings. Right now, indentation is performed with spaces and it should be with tabs. This is why Travis is failing. To make sure the code works do a cd flink-staging/flink-gelly and then mvn verify. A success there equals a travis success 98% of the time :) After implementing an algorithm, it's always good to check whether it works correctly by writing a test (check the test/example/ folder :) ). I would also update the documentation here. In the Library method section add an entry *HITS, perhaps with the full name of the acronym as well. I left some comments in-line. Overall this does not look bad at all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108618014 I generally like that idea. Especially the extensibility with respect to timestamps and watermark generation is a good point. Retrieving the lock object from the context is not very obvious, but then again, someone who implements a fault tolerant exactly-once source should ready the javadocs and have a look at an example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571665#comment-14571665 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108618236 How about we still merge this now, to make sure we have a good version in to start testing? The change you propose is API only, and would not affect internals/timings... Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571661#comment-14571661 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-108618014 I generally like that idea. Especially the extensibility with respect to timestamps and watermark generation is a good point. Retrieving the lock object from the context is not very obvious, but then again, someone who implements a fault tolerant exactly-once source should ready the javadocs and have a look at an example. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571677#comment-14571677 ] Stephan Ewen commented on FLINK-1967: - With next version, you mean 0.10, right? I am still in favor. Making all windows data driven (timestamps are data) should help simplify the implementation as well. Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31678387 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.FlatMapFunction; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.library.HITS; +import org.apache.flink.graph.utils.Hits; +import org.apache.flink.graph.utils.Tuple3ToEdgeMap; +import org.apache.flink.util.Collector; + +/** + * This program is an example for HITS algorithm. + * the result is either a hub value or authority value base user selection. + * + * If no arguments are provided, the example runs with a random graph of 10 vertices + * and random edge weights. + */ + +public class HITSExample implements ProgramDescription { +@SuppressWarnings(serial) +public static void main(String[] args) throws Exception { + +if(!parseParameters(args)) { +return; +} + +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +DataSetEdgeLong, String links = getEdgesDataSet(env); + +GraphLong, Double, String network = Graph.fromDataSet(links, new MapFunctionLong, Double() { + +public Double map(Long value) throws Exception { +return 1.0; +} +}, env); + +// add graph to HITS class with iteration value and hub or authority enum value. +DataSetVertexLong, Double HitsValue =network.run( +new HITSLong(Hits.AUTHORITY,maxIterations)).getVertices(); + +if (fileOutput) { +HitsValue.writeAsCsv(outputPath, \n, \t); +} else { +HitsValue.print(); +} +//env.execute(HITS algorithm); +} + +@Override +public String getDescription() { +return HITS algorithm example; +} + +// * +// UTIL METHODS +// * + +private static boolean fileOutput = false; +private static long numPages = 10; +private static String edgeInputPath = null; +private static String outputPath = null; +private static int maxIterations = 5; + +private static boolean parseParameters(String[] args) { + +if(args.length 0) { +if(args.length != 3) { +System.err.println(Usage: PageRank input edges path output path num iterations); --- End diff -- It's no longer Page Rank ;) let's not confuse people! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31678355 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.FlatMapFunction; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.library.HITS; +import org.apache.flink.graph.utils.Hits; +import org.apache.flink.graph.utils.Tuple3ToEdgeMap; +import org.apache.flink.util.Collector; + +/** + * This program is an example for HITS algorithm. + * the result is either a hub value or authority value base user selection. + * + * If no arguments are provided, the example runs with a random graph of 10 vertices + * and random edge weights. + */ + +public class HITSExample implements ProgramDescription { +@SuppressWarnings(serial) +public static void main(String[] args) throws Exception { + +if(!parseParameters(args)) { +return; +} + +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +DataSetEdgeLong, String links = getEdgesDataSet(env); + +GraphLong, Double, String network = Graph.fromDataSet(links, new MapFunctionLong, Double() { + +public Double map(Long value) throws Exception { +return 1.0; +} +}, env); + +// add graph to HITS class with iteration value and hub or authority enum value. +DataSetVertexLong, Double HitsValue =network.run( +new HITSLong(Hits.AUTHORITY,maxIterations)).getVertices(); + +if (fileOutput) { +HitsValue.writeAsCsv(outputPath, \n, \t); +} else { +HitsValue.print(); +} +//env.execute(HITS algorithm); --- End diff -- The env.execute() caused problems for you because you were only testing print(). You don't need to do an env.execute() on the else branch as print() now has the same status as count() and collect(), but this command is still needed on the then branch(i.e. after writeAsCsv). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1528) Add local clustering coefficient library method and example
[ https://issues.apache.org/jira/browse/FLINK-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570573#comment-14570573 ] ASF GitHub Bot commented on FLINK-1528: --- Github user balidani commented on the pull request: https://github.com/apache/flink/pull/420#issuecomment-108284218 Yeah, I should definitely finish this! I'll take a look tonight, sorry about that :) Add local clustering coefficient library method and example --- Key: FLINK-1528 URL: https://issues.apache.org/jira/browse/FLINK-1528 Project: Flink Issue Type: Task Components: Gelly Reporter: Vasia Kalavri Assignee: Daniel Bali Add a gelly library method and example to compute the local clustering coefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1528][Gelly] Added Local Clustering Coe...
Github user balidani commented on the pull request: https://github.com/apache/flink/pull/420#issuecomment-108284218 Yeah, I should definitely finish this! I'll take a look tonight, sorry about that :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [contrib] Storm compatibility
Github user szape commented on the pull request: https://github.com/apache/flink/pull/764#issuecomment-108284180 The first 10 commits are not rebased on the current master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570586#comment-14570586 ] Vasia Kalavri commented on FLINK-1520: -- Hey [~cebe]! One more ping to you :) If you're not working on this, can I release this issue? Thanks! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Carsten Brandt Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2147) Approximate calculation of frequencies in data streams
[ https://issues.apache.org/jira/browse/FLINK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2147: --- Labels: statistics (was: ) Approximate calculation of frequencies in data streams -- Key: FLINK-2147 URL: https://issues.apache.org/jira/browse/FLINK-2147 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Gabor Gevay Priority: Minor Labels: statistics Count-Min sketch is a hashing-based algorithm for approximately keeping track of the frequencies of elements in a data stream. It is described by Cormode et al. in the following paper: http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf Note that this algorithm can be conveniently implemented in a distributed way, as described in section 3.2 of the paper. The paper http://www.vldb.org/conf/2002/S10P03.pdf also describes algorithms for approximately keeping track of frequencies, but here the user can specify a threshold below which she is not interested in the frequency of an element. The error-bounds are also different than the Count-min sketch algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2148) Approximately calculate the number of distinct elements of a stream
Gabor Gevay created FLINK-2148: -- Summary: Approximately calculate the number of distinct elements of a stream Key: FLINK-2148 URL: https://issues.apache.org/jira/browse/FLINK-2148 Project: Flink Issue Type: Sub-task Reporter: Gabor Gevay Priority: Minor In the paper http://people.seas.harvard.edu/~minilek/papers/f0.pdf Kane et al. describes an optimal algorithm for estimating the number of distinct elements in a data stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2150) Add a library method that assigns unique Long values to vertices
Vasia Kalavri created FLINK-2150: Summary: Add a library method that assigns unique Long values to vertices Key: FLINK-2150 URL: https://issues.apache.org/jira/browse/FLINK-2150 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Priority: Minor In some graph algorithms, it is required to initialize the vertex values with unique values (e.g. label propagation). This issue proposes adding a Gelly library method that receives an input graph and initializes its vertex values with unique Long values. This method can then also be used to improve the MusicProfiles example. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2145) Median calculation for windows
[ https://issues.apache.org/jira/browse/FLINK-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2145: --- Labels: statistics (was: ) Median calculation for windows -- Key: FLINK-2145 URL: https://issues.apache.org/jira/browse/FLINK-2145 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Labels: statistics The PreReducer for this has the following algorithm: We maintain two multisets (as, for example, balanced binary search trees), that always partition the elements of the current window to smaller-than-median and larger-than-median elements. At each store and evict, we can maintain this invariant with only O(1) multiset operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2145) Median calculation for windows
Gabor Gevay created FLINK-2145: -- Summary: Median calculation for windows Key: FLINK-2145 URL: https://issues.apache.org/jira/browse/FLINK-2145 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor The PreReducer for this has the following algorithm: We maintain two multisets (as, for example, balanced binary search trees), that always partition the elements of the current window to smaller-than-median and larger-than-median elements. At each store and evict, we can maintain this invariant with only O(1) multiset operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2144) Implement count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2144: --- Labels: statistics (was: ) Implement count, average, and variance for windows -- Key: FLINK-2144 URL: https://issues.apache.org/jira/browse/FLINK-2144 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Labels: statistics By count I mean the number of elements in the window. These can be implemented very efficiently building on FLINK-2143. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2144) Implement count, average, and variance for windows
[ https://issues.apache.org/jira/browse/FLINK-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2144: --- Description: By count I mean the number of elements in the window. These can be implemented very efficiently building on FLINK-2143: Store: O(1) Evict: O(1) emitWindow: O(1) memory: O(1) was: By count I mean the number of elements in the window. These can be implemented very efficiently building on FLINK-2143. Implement count, average, and variance for windows -- Key: FLINK-2144 URL: https://issues.apache.org/jira/browse/FLINK-2144 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Labels: statistics By count I mean the number of elements in the window. These can be implemented very efficiently building on FLINK-2143: Store: O(1) Evict: O(1) emitWindow: O(1) memory: O(1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570572#comment-14570572 ] ASF GitHub Bot commented on FLINK-1707: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/649#issuecomment-108284020 Hey @joey001! Are you still working on this? Let us know if you need help! Add an Affinity Propagation Library Method -- Key: FLINK-1707 URL: https://issues.apache.org/jira/browse/FLINK-1707 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: joey Priority: Minor This issue proposes adding the an implementation of the Affinity Propagation algorithm as a Gelly library method and a corresponding example. The algorithm is described in paper [1] and a description of a vertex-centric implementation can be found is [2]. [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2149) Simplify Gelly Jaccard similarity example
Vasia Kalavri created FLINK-2149: Summary: Simplify Gelly Jaccard similarity example Key: FLINK-2149 URL: https://issues.apache.org/jira/browse/FLINK-2149 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Priority: Trivial The Gelly Jaccard similarity example can be simplified by replacing the groupReduceOnEdges method with the simpler reduceOnEdges. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1759) Execution statistics for vertex-centric iterations
[ https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-1759: - Labels: (was: easyfix starter) Execution statistics for vertex-centric iterations -- Key: FLINK-1759 URL: https://issues.apache.org/jira/browse/FLINK-1759 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Priority: Minor It would be nice to add an option for gathering execution statistics from VertexCentricIteration. In particular, the following metrics could be useful: - total number of supersteps - number of messages sent (total / per superstep) - bytes of messages exchanged (total / per superstep) - execution time (total / per superstep) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...
Github user FGoessler commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-108310843 The travis build is failing on Oracle JDK 8. Maven or Flink are hanging according to the build log. Can anyone help or at least restart the build? Are there any known flipping tests? Imo the failure isn't related to our changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570676#comment-14570676 ] ASF GitHub Bot commented on FLINK-1731: --- Github user FGoessler commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-108310843 The travis build is failing on Oracle JDK 8. Maven or Flink are hanging according to the build log. Can anyone help or at least restart the build? Are there any known flipping tests? Imo the failure isn't related to our changes. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1707][WIP]Add an Affinity Propagation L...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/649#issuecomment-108284020 Hey @joey001! Are you still working on this? Let us know if you need help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [contrib] Storm compatibility
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/764#issuecomment-108292671 I thought this is a clean branch. I am working on this currently... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604864 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -309,8 +207,10 @@ object MultipleLinearRegression { : DataSet[LabeledVector] = { --- End diff -- Docstring for the return type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570509#comment-14570509 ] ASF GitHub Bot commented on FLINK-1993: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604864 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -309,8 +207,10 @@ object MultipleLinearRegression { : DataSet[LabeledVector] = { --- End diff -- Docstring for the return type? Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2137) Expose partitionByHash for WindowedDataStream
[ https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570528#comment-14570528 ] Márton Balassi commented on FLINK-2137: --- I am personally fine with not having it, if no objections please mark it as not a problem. Expose partitionByHash for WindowedDataStream - Key: FLINK-2137 URL: https://issues.apache.org/jira/browse/FLINK-2137 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann This functionality has been recently exposed for DataStreams and ConnectedDataStreams, but not for WindowedDataStreams yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2136) Test the streaming scala API
[ https://issues.apache.org/jira/browse/FLINK-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gábor Hermann reassigned FLINK-2136: Assignee: Gábor Hermann Test the streaming scala API Key: FLINK-2136 URL: https://issues.apache.org/jira/browse/FLINK-2136 Project: Flink Issue Type: Test Components: Scala API, Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann There are no test covering the streaming scala API. I would suggest to test whether the StreamGraph created by a certain operation looks as expected. Deeper layers and runtime should not be tested here, that is done in streaming-core. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1526) Add Minimum Spanning Tree library method and example
[ https://issues.apache.org/jira/browse/FLINK-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570554#comment-14570554 ] ASF GitHub Bot commented on FLINK-1526: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/434#issuecomment-108278253 Hey @andralungu! I think we should close this one. We can't really continue from this state anyway. I guess we'll have to revisit this problem once we have for-loop iteration support. Add Minimum Spanning Tree library method and example Key: FLINK-1526 URL: https://issues.apache.org/jira/browse/FLINK-1526 Project: Flink Issue Type: Task Components: Gelly Reporter: Vasia Kalavri Assignee: Andra Lungu This issue proposes the addition of a library method and an example for distributed minimum spanning tree in Gelly. The DMST algorithm is very interesting because it is quite different from PageRank-like iterative graph algorithms. It consists of distinct phases inside the same iteration and requires a mechanism to detect convergence of one phase to proceed to the next one. Current implementations in vertex-centric models are quite long (1000 lines) and hard to understand. You can find a description of the algorithm [here | http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf] and [here | http://www.vldb.org/pvldb/vol7/p1047-han.pdf]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2146) Fast calculation of min/max with arbitrary eviction and triggers
Gabor Gevay created FLINK-2146: -- Summary: Fast calculation of min/max with arbitrary eviction and triggers Key: FLINK-2146 URL: https://issues.apache.org/jira/browse/FLINK-2146 Project: Flink Issue Type: Sub-task Reporter: Gabor Gevay Priority: Minor The last algorithm described here could be used: http://codercareer.blogspot.com/2012/02/no-33-maximums-in-sliding-windows.html It is based on a double-ended queue which maintains a sorted list of elements of the current window that have the possibility of being the maximal element in the future. Store: O(1) amortized Evict: O(1) emitWindow: O(1) memory: O(N) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570601#comment-14570601 ] PJ Van Aeken commented on FLINK-1962: - [~ssc], you can find an implementation to play with in my fork (branch scala-gelly-api). It has all of the functionalities from the Java API except for a few utility methods for creating graphs, and I am also still working on Vertex Centric Iterations and Gather Sum Apply Iterations. Other than that most of it should be there, although I am a couple commits behind. Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/695#issuecomment-108290010 Thanks for your work. :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2048) Enhance Twitter Stream support
[ https://issues.apache.org/jira/browse/FLINK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570605#comment-14570605 ] ASF GitHub Bot commented on FLINK-2048: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/695 Enhance Twitter Stream support -- Key: FLINK-2048 URL: https://issues.apache.org/jira/browse/FLINK-2048 Project: Flink Issue Type: Task Components: Streaming Affects Versions: master Reporter: Hilmi Yildirim Assignee: Hilmi Yildirim Original Estimate: 2h Remaining Estimate: 2h Flink does not have a real twitter support. It only has a TwitterSource which uses a sample stream which can not be used properly for analysis. It is possible to use external tools to create streams (e.g. Kafka) but it is beneficially to create a propert twitter stream in Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/695 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1759) Execution statistics for vertex-centric iterations
[ https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-1759: - Labels: easyfix starter (was: ) Execution statistics for vertex-centric iterations -- Key: FLINK-1759 URL: https://issues.apache.org/jira/browse/FLINK-1759 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Priority: Minor Labels: easyfix, starter It would be nice to add an option for gathering execution statistics from VertexCentricIteration. In particular, the following metrics could be useful: - total number of supersteps - number of messages sent (total / per superstep) - bytes of messages exchanged (total / per superstep) - execution time (total / per superstep) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [contrib] Storm compatibility
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/764#issuecomment-108292481 You can just ad one commit at the end, please do not rewrite the complete history for this. That is unnecessary overhead. If you feel confident about it we can even put it in the 0.9 release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1528) Add local clustering coefficient library method and example
[ https://issues.apache.org/jira/browse/FLINK-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570567#comment-14570567 ] ASF GitHub Bot commented on FLINK-1528: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/420#issuecomment-108283573 Hey @balidani! Would you like to finish this up? It's not really urgent, but it's almost finished and it'd be a pity to abandon :) Someone else could also take over of course. Just let us know! Add local clustering coefficient library method and example --- Key: FLINK-1528 URL: https://issues.apache.org/jira/browse/FLINK-1528 Project: Flink Issue Type: Task Components: Gelly Reporter: Vasia Kalavri Assignee: Daniel Bali Add a gelly library method and example to compute the local clustering coefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570607#comment-14570607 ] Vasia Kalavri commented on FLINK-1962: -- Awesome news [~vanaepi]! Add Gelly Scala API --- Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gelly, Scala API Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: PJ Van Aeken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [streaming] Consolidate streaming API method n...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/761#issuecomment-108295020 Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31596533 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/utils/Hits.java --- @@ -0,0 +1,14 @@ +package org.apache.flink.graph.utils; --- End diff -- Missing Apache license header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Hits
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/765#discussion_r31596508 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/HITSExample.java --- @@ -0,0 +1,113 @@ +package org.apache.flink.graph.example; --- End diff -- Missing Apache license header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604529 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -87,11 +89,11 @@ import org.apache.flink.ml.pipeline.{FitOperation, PredictOperation, Predictor} * */ class MultipleLinearRegression extends Predictor[MultipleLinearRegression] { - + import org.apache.flink.ml._ --- End diff -- Line 49 typo: iteratinos - iterations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements
[ https://issues.apache.org/jira/browse/FLINK-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570522#comment-14570522 ] ASF GitHub Bot commented on FLINK-2130: --- GitHub user mbalassi opened a pull request: https://github.com/apache/flink/pull/767 [FLINK-2130] [streaming] RMQ Source properly propagates exceptions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink flink-2130 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #767 commit 2b784f09493767ca5b6388ac692406466dc55575 Author: mbalassi mbala...@apache.org Date: 2015-06-03T09:16:48Z [FLINK-2130] [streaming] RMQ Source properly propagates exceptions RabbitMQ source does not fail when failing to retrieve elements --- Key: FLINK-2130 URL: https://issues.apache.org/jira/browse/FLINK-2130 Project: Flink Issue Type: Bug Components: Streaming, Streaming Connectors Reporter: Stephan Ewen Assignee: Márton Balassi The RMQ source only logs when elements cannot be retrieved. Failures are not propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2137) Expose partitionByHash for WindowedDataStream
[ https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570525#comment-14570525 ] Gyula Fora commented on FLINK-2137: --- I dont think this makes too much sense for the windowing case. The groupBy with keyselector should be enough. Expose partitionByHash for WindowedDataStream - Key: FLINK-2137 URL: https://issues.apache.org/jira/browse/FLINK-2137 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann This functionality has been recently exposed for DataStreams and ConnectedDataStreams, but not for WindowedDataStreams yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570367#comment-14570367 ] ASF GitHub Bot commented on FLINK-1319: --- Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31597259 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java --- @@ -54,8 +54,11 @@ private MapString, DataSet? broadcastVariables; + // NOTE: only set this variable via setSemanticProperties() --- End diff -- Manual annotations should always trump optimizer annotations. The analyzer can not determine all semantic properties. E.g. when using KeySelectors. The user should still have the possibility to override semantic properties to add more properties. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31597259 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java --- @@ -54,8 +54,11 @@ private MapString, DataSet? broadcastVariables; + // NOTE: only set this variable via setSemanticProperties() --- End diff -- Manual annotations should always trump optimizer annotations. The analyzer can not determine all semantic properties. E.g. when using KeySelectors. The user should still have the possibility to override semantic properties to add more properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/763#issuecomment-108262606 looks good, +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2098] Improvements on checkpoint-aligne...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/755#issuecomment-108218506 @StephanEwen I took the changes, added them on top of my PR and added some more refinements. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570392#comment-14570392 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/755#issuecomment-108218506 @StephanEwen I took the changes, added them on top of my PR and added some more refinements. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570492#comment-14570492 ] ASF GitHub Bot commented on FLINK-1993: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604106 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala --- @@ -35,7 +35,7 @@ import org.apache.flink.ml.common.{FlinkMLTools, ParameterMap, WithParameters} * * @tparam Self Type of the implementing class */ -trait Predictor[Self] extends Estimator[Self] with WithParameters with Serializable { +trait Predictor[Self] extends Estimator[Self] with WithParameters { --- End diff -- Why is Serializable no longer needed? Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604106 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala --- @@ -35,7 +35,7 @@ import org.apache.flink.ml.common.{FlinkMLTools, ParameterMap, WithParameters} * * @tparam Self Type of the implementing class */ -trait Predictor[Self] extends Estimator[Self] with WithParameters with Serializable { +trait Predictor[Self] extends Estimator[Self] with WithParameters { --- End diff -- Why is Serializable no longer needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2137) Expose partitionByHash for WindowedDataStream
[ https://issues.apache.org/jira/browse/FLINK-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-2137. - Resolution: Not A Problem Expose partitionByHash for WindowedDataStream - Key: FLINK-2137 URL: https://issues.apache.org/jira/browse/FLINK-2137 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann This functionality has been recently exposed for DataStreams and ConnectedDataStreams, but not for WindowedDataStreams yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570503#comment-14570503 ] ASF GitHub Bot commented on FLINK-1993: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/760#discussion_r31604529 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/regression/MultipleLinearRegression.scala --- @@ -87,11 +89,11 @@ import org.apache.flink.ml.pipeline.{FitOperation, PredictOperation, Predictor} * */ class MultipleLinearRegression extends Predictor[MultipleLinearRegression] { - + import org.apache.flink.ml._ --- End diff -- Line 49 typo: iteratinos - iterations Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2143) Add an overload to reduceWindow which takes the inverse of the reduceFunction as a second parameter
Gabor Gevay created FLINK-2143: -- Summary: Add an overload to reduceWindow which takes the inverse of the reduceFunction as a second parameter Key: FLINK-2143 URL: https://issues.apache.org/jira/browse/FLINK-2143 Project: Flink Issue Type: Sub-task Reporter: Gabor Gevay Assignee: Gabor Gevay If the inverse of the reduceFunction is also available (for example subtraction when summing numbers), then a PreReducer can maintain the aggregate in O(1) memory and O(1) time for evict, store, and emitWindow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features
[ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570462#comment-14570462 ] Theodore Vasiloudis commented on FLINK-2030: Is there a PR for this issue? Implement an online histogram with Merging and equalization features Key: FLINK-2030 URL: https://issues.apache.org/jira/browse/FLINK-2030 Project: Flink Issue Type: Sub-task Components: Machine Learning Library Reporter: Sachin Goel Assignee: Sachin Goel Priority: Minor Labels: ML For the implementation of the decision tree in https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an histogram with online updates, merging and equalization features. A reference implementation is provided in [1] [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [contrib] Storm compatibility
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/764#issuecomment-108260360 What exactly is broken what the commits? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2141) Allow GSA's Gather to perform this operation in more than one direction
Andra Lungu created FLINK-2141: -- Summary: Allow GSA's Gather to perform this operation in more than one direction Key: FLINK-2141 URL: https://issues.apache.org/jira/browse/FLINK-2141 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu For the time being, a vertex only gathers information from its in-edges. Similarly to the vertex-centric approach, we would like to allow users to gather data from out and all edges as well. This property should be set using a setDirection() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1993] [ml] Replaces custom SGD in Multi...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/760#issuecomment-108254611 Looks good, some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570511#comment-14570511 ] ASF GitHub Bot commented on FLINK-1993: --- Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/760#issuecomment-108254611 Looks good, some minor comments. Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
Gabor Gevay created FLINK-2142: -- Summary: GSoC project: Exact and Approximate Statistics for Data Streams and Windows Key: FLINK-2142 URL: https://issues.apache.org/jira/browse/FLINK-2142 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom preaggregators. The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2144) Implement count, average, and variance for windows
Gabor Gevay created FLINK-2144: -- Summary: Implement count, average, and variance for windows Key: FLINK-2144 URL: https://issues.apache.org/jira/browse/FLINK-2144 Project: Flink Issue Type: Sub-task Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor By count I mean the number of elements in the window. These can be implemented very efficiently building on FLINK-2143. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1967) Introduce (Event)time in Streaming
[ https://issues.apache.org/jira/browse/FLINK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570389#comment-14570389 ] Aljoscha Krettek commented on FLINK-1967: - Are we still interested in having this in the next version? If yes, the windowing system will have to be reworked, meaning the whole shebang: policies, operators, window optimisations ... The problem is, that the current model only works when elements arrive in order while working on user timestamps would require out-of-order processing. Introduce (Event)time in Streaming -- Key: FLINK-1967 URL: https://issues.apache.org/jira/browse/FLINK-1967 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek This requires introducing a timestamp in streaming record and a change in the sources to add timestamps to records. This will also introduce punctuations (or low watermarks) to allow windows to work correctly on unordered, timestamped input data. In the process of this, the windowing subsystem also needs to be adapted to use the punctuations. Furthermore, all operators need to be made aware of punctuations and correctly forward them. Then, a new operator must be introduced to to allow modification of timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2140) Access the number of vertices from within the GSA functions
Andra Lungu created FLINK-2140: -- Summary: Access the number of vertices from within the GSA functions Key: FLINK-2140 URL: https://issues.apache.org/jira/browse/FLINK-2140 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Similarly to the Vertex-centric approach we would like to allow the user to access the number of vertices from the Gather, Sum and Apply functions respectively. This property will become available by setting [setOptNumVertices()] the numVertices option to true. The number of vertices can then be accessed in the gather, sum and apply functions using the getNumberOfVertices() method. If the option is not set in the configuration, this method will return -1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Theodore Vasiloudis reassigned FLINK-1993: -- Assignee: Till Rohrmann (was: Theodore Vasiloudis) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-2142: --- Description: The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom PreReducers. The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds. was: The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom preaggregators. The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds. GSoC project: Exact and Approximate Statistics for Data Streams and Windows --- Key: FLINK-2142 URL: https://issues.apache.org/jira/browse/FLINK-2142 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Labels: gsoc2015, statistics, streaming The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom PreReducers. The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1526][gelly] [work in progress] Added M...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/434#issuecomment-108278253 Hey @andralungu! I think we should close this one. We can't really continue from this state anyway. I guess we'll have to revisit this problem once we have for-loop iteration support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1528][Gelly] Added Local Clustering Coe...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/420#issuecomment-108283573 Hey @balidani! Would you like to finish this up? It's not really urgent, but it's almost finished and it'd be a pity to abandon :) Someone else could also take over of course. Just let us know! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---