[jira] [Commented] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274340#comment-16274340 ] Vasia Kalavri commented on FLINK-5506: -- I only had a quick look at the code; I will need to re-read the paper to make sure the algorithm semantics are correct with the following: I believe the problem is line 147 in {{CommunityDetection.java}}. The code assumes we have received only positive scores, while negative ones are indeed possible. Changing this line to {{double maxScore = -Double.MAX_VALUE;}} should fix it. > Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException > - > > Key: FLINK-5506 > URL: https://issues.apache.org/jira/browse/FLINK-5506 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.4, 1.3.2, 1.4.1 >Reporter: Miguel E. Coimbra > Labels: easyfix, newbie > Original Estimate: 2h > Remaining Estimate: 2h > > Reporting this here as per Vasia's advice. > I am having the following problem while trying out the > org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API > (Java). > Specs: JDK 1.8.0_102 x64 > Apache Flink: 1.1.4 > Suppose I have a very small (I tried an example with 38 vertices as well) > dataset stored in a tab-separated file 3-vertex.tsv: > {code} > #id1 id2 score > 010 > 020 > 030 > {code} > This is just a central vertex with 3 neighbors (disconnected between > themselves). > I am loading the dataset and executing the algorithm with the following code: > {code} > // Load the data from the .tsv file. > final DataSet<Tuple3<Long, Long, Double>> edgeTuples = > env.readCsvFile(inputPath) > .fieldDelimiter("\t") // node IDs are separated by spaces > .ignoreComments("#") // comments start with "%" > .types(Long.class, Long.class, Double.class); > // Generate a graph and add reverse edges (undirected). > final Graph<Long, Long, Double> graph = Graph.fromTupleDataSet(edgeTuples, > new MapFunction<Long, Long>() { > private static final long serialVersionUID = 8713516577419451509L; > public Long map(Long value) { > return value; > } > }, > env).getUndirected(); > // CommunityDetection parameters. > final double hopAttenuationDelta = 0.5d; > final int iterationCount = 10; > // Prepare and trigger the execution. > DataSet<Vertex<Long, Long>> vs = graph.run(new > org.apache.flink.graph.library.CommunityDetection(iterationCount, > hopAttenuationDelta)).getVertices(); > vs.print(); > {code} > Running this code throws the following exception (check the bold line): > {code} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException > at > org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158) > at > org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389) > at > org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218) > at org.apache.flink.runtime.op
[GitHub] flink pull request #4179: [FLINK-6989] [gelly] Refactor examples with Output...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/4179#discussion_r124514726 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/parameter/Parameter.java --- @@ -40,6 +40,15 @@ String getUsage(); /** +* A hidden parameter is parsed from the command-line configuration but is +* not printed in the usage string. This can be used for power-user options +* not displayed to the general user. --- End diff -- This sounds interesting. Can you give an example when this might be useful? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3433: [FLINK-5911] [gelly] Command-line parameters
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/3433#discussion_r108067501 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/parameter/Parameter.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.drivers.parameter; + +import org.apache.flink.api.java.utils.ParameterTool; + +/** + * Encapsulates the usage and configuration of a command-line parameter. + * + * @param parameter value type + */ +public interface Parameter { + + /** +* An informal usage string. Parameter names are prefixed with "--". +* +* Optional parameters are enclosed by "[" and "]". +* +* Generic values are represented by all-caps with specific values enclosed +* by "" and "". +* +* @return command-line usage string +*/ + String getParameterization(); --- End diff -- Why not `getUsage()`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3431: [FLINK-5910] [gelly] Framework for Gelly examples
Github user vasia commented on the issue: https://github.com/apache/flink/pull/3431 Thanks! Then, it's good to go from my side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results
Github user vasia commented on the issue: https://github.com/apache/flink/pull/3434 Thanks @greghogan. Looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results
Github user vasia commented on the issue: https://github.com/apache/flink/pull/3434 Does `AnalyticResult` also need a method like `toVerboseString()`? Could we replace both with a e.g. `Result` type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2910) Reorganize / Combine Gelly tests
[ https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896927#comment-15896927 ] Vasia Kalavri commented on FLINK-2910: -- I think that's a good idea [~greghogan]. > Reorganize / Combine Gelly tests > > > Key: FLINK-2910 > URL: https://issues.apache.org/jira/browse/FLINK-2910 > Project: Flink > Issue Type: Test > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > Fix For: 1.3.0 > > > - Some tests are spread out in different classes could be combined as well, > e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 > for neighborhood methods, etc. > - Testing a binary operator (i.e. union and difference) is done in two > similar tests: one is testing the expected vertex set and one the expected > edge set. This can be combined in one test per operator using > {{LocalCollectionOutputFormat<>}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results
Github user vasia commented on the issue: https://github.com/apache/flink/pull/3434 Thanks for the clarification @greghogan. +1 from me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results
Github user vasia commented on the issue: https://github.com/apache/flink/pull/3434 Hi @greghogan, thank you for the PR. I didn't spot anything that needs fixing, but I'm wondering what's the motivation to add these interfaces. I see how `toVerboseString()` is useful, but not really why `AnalyticResult` is needed. Also, why introduce `UnaryResult`, `BinaryResult`, and `TertiaryResult` instead of simply using tuple types? I also see that this PR contains no changes to the docs and that the current 1.3-SNAPSHOT docs already reflect the changes of this PR. What am I missing here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-4949) Refactor Gelly driver inputs
[ https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696 ] Vasia Kalavri edited comment on FLINK-4949 at 3/2/17 6:05 PM: -- Thank you [~greghogan]. I can review during the weekend. was (Author: vkalavri): Thanks you [~greghogan]. I can review during the weekend. > Refactor Gelly driver inputs > > > Key: FLINK-4949 > URL: https://issues.apache.org/jira/browse/FLINK-4949 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.3.0 > > > The Gelly drivers started as simple wrappers around library algorithms but > have grown to handle a matrix of input sources while often running multiple > algorithms and analytics with custom parameterization. > This ticket will refactor the sourcing of the input graph into separate > classes for CSV files and RMat which will simplify the inclusion of new data > sources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4949) Refactor Gelly driver inputs
[ https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696 ] Vasia Kalavri commented on FLINK-4949: -- Thanks you [~greghogan]. I can review during the weekend. > Refactor Gelly driver inputs > > > Key: FLINK-4949 > URL: https://issues.apache.org/jira/browse/FLINK-4949 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.3.0 > > > The Gelly drivers started as simple wrappers around library algorithms but > have grown to handle a matrix of input sources while often running multiple > algorithms and analytics with custom parameterization. > This ticket will refactor the sourcing of the input graph into separate > classes for CSV files and RMat which will simplify the inclusion of new data > sources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #2733: [FLINK-4896] [gelly] PageRank algorithm for directed grap...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2733 Thanks for the results @greghogan! Updating the docs would be nice, otherwise +1. Do you think we should move existing implementations in the examples (there are still relevant to demonstrate iteration APIs) and keep this one as the only library method since it's the fastest and more general one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2910) Reorganize / Combine Gelly tests
[ https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-2910: - Summary: Reorganize / Combine Gelly tests (was: Combine tests for binary graph operators) > Reorganize / Combine Gelly tests > > > Key: FLINK-2910 > URL: https://issues.apache.org/jira/browse/FLINK-2910 > Project: Flink > Issue Type: Test > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > Fix For: 1.3.0 > > > - Some tests are spread out in different classes could be combined as well, > e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 > for neighborhood methods, etc. > - Testing a binary operator (i.e. union and difference) is done in two > similar tests: one is testing the expected vertex set and one the expected > edge set. This can be combined in one test per operator using > {{LocalCollectionOutputFormat<>}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators
[ https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-2910: - Description: - Some tests are spread out in different classes could be combined as well, e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 for neighborhood methods, etc. - Testing a binary operator (i.e. union and difference) is done in two similar tests: one is testing the expected vertex set and one the expected edge set. This can be combined in one test per operator using {{LocalCollectionOutputFormat<>}} was:Atm, testing a binary operator (i.e. union and difference) is done in two similar tests: one is testing the expected vertex set and one the expected edge set. This can be combined in one test per operator using {{LocalCollectionOutputFormat<>}} > Combine tests for binary graph operators > > > Key: FLINK-2910 > URL: https://issues.apache.org/jira/browse/FLINK-2910 > Project: Flink > Issue Type: Test > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > Fix For: 1.3.0 > > > - Some tests are spread out in different classes could be combined as well, > e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 > for neighborhood methods, etc. > - Testing a binary operator (i.e. union and difference) is done in two > similar tests: one is testing the expected vertex set and one the expected > edge set. This can be combined in one test per operator using > {{LocalCollectionOutputFormat<>}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators
[ https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-2910: - Fix Version/s: 1.3.0 > Combine tests for binary graph operators > > > Key: FLINK-2910 > URL: https://issues.apache.org/jira/browse/FLINK-2910 > Project: Flink > Issue Type: Test > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > Fix For: 1.3.0 > > > Atm, testing a binary operator (i.e. union and difference) is done in two > similar tests: one is testing the expected vertex set and one the expected > edge set. This can be combined in one test per operator using > {{LocalCollectionOutputFormat<>}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2910) Combine tests for binary graph operators
[ https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889618#comment-15889618 ] Vasia Kalavri commented on FLINK-2910: -- Thanks for the heads-up [~uce]. Yes, this is still relevant. I will update it. [~mju] are you still planning to work on this? > Combine tests for binary graph operators > > > Key: FLINK-2910 > URL: https://issues.apache.org/jira/browse/FLINK-2910 > Project: Flink > Issue Type: Test > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > > Atm, testing a binary operator (i.e. union and difference) is done in two > similar tests: one is testing the expected vertex set and one the expected > edge set. This can be combined in one test per operator using > {{LocalCollectionOutputFormat<>}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #2885: [FLINK-1707] Affinity propagation
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2885 Hi @joseprupi, could you please rebase this PR? Right now it's not clear which are your changes in order to review. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations
[ https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-5127: - Affects Version/s: 1.1.0 1.2.0 > Reduce the amount of intermediate data in vertex-centric iterations > --- > > Key: FLINK-5127 > URL: https://issues.apache.org/jira/browse/FLINK-5127 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0, 1.2.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Fix For: 1.3.0 > > > The vertex-centric plan contains a join between the workset (messages) and > the solution set (vertices) that outputs <Vertex, Message> tuples. This > intermediate dataset is then co-grouped with the edges to provide the Pregel > interface directly. > This issue proposes an improvement to reduce the size of this intermediate > dataset. In particular, the vertex state does not have to be attached to all > the output tuples of the join. If we replace the join with a coGroup and use > an `Either` type, we can attach the vertex state to the first tuple only. The > subsequent coGroup can retrieve the vertex state from the first tuple and > correctly expose the Pregel interface. > In my preliminary experiments, I find that this change reduces intermediate > data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations
[ https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-5127: - Fix Version/s: 1.3.0 > Reduce the amount of intermediate data in vertex-centric iterations > --- > > Key: FLINK-5127 > URL: https://issues.apache.org/jira/browse/FLINK-5127 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0, 1.2.0 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Fix For: 1.3.0 > > > The vertex-centric plan contains a join between the workset (messages) and > the solution set (vertices) that outputs <Vertex, Message> tuples. This > intermediate dataset is then co-grouped with the edges to provide the Pregel > interface directly. > This issue proposes an improvement to reduce the size of this intermediate > dataset. In particular, the vertex state does not have to be attached to all > the output tuples of the join. If we replace the join with a coGroup and use > an `Either` type, we can attach the vertex state to the first tuple only. The > subsequent coGroup can retrieve the vertex state from the first tuple and > correctly expose the Pregel interface. > In my preliminary experiments, I find that this change reduces intermediate > data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example
Title: Message Title Vasia Kalavri commented on FLINK-1526 Re: Add Minimum Spanning Tree library method and example Hi Xingcan Cui, the problem is that currently you cannot have an iteration (e.g. vertex-centric) inside a for-loop or a while-loop. So, your pseudocode won't work (well, it will, but only for very small inputs). I believe "no value updates" refers to no vertex values changing. Where did you see this? Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example
Title: Message Title Vasia Kalavri commented on FLINK-1526 Re: Add Minimum Spanning Tree library method and example Hi Xingcan Cui, thank you for your interest in this issue. As you can see in the comments history, contributors have had problems completing this task without support for for-loop iterations. Are you planning to take a different approach? Could you describe how you're planning to proceed? Thanks! Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] [Created] (FLINK-5597) Improve the LocalClusteringCoefficient documentation
Vasia Kalavri created FLINK-5597: Summary: Improve the LocalClusteringCoefficient documentation Key: FLINK-5597 URL: https://issues.apache.org/jira/browse/FLINK-5597 Project: Flink Issue Type: Bug Components: Documentation, Gelly Reporter: Vasia Kalavri The LocalClusteringCoefficient usage section should explain what is the algorithm output and how to retrieve the actual local clustering coefficient scores from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2885: [FLINK-1707] Affinity propagation
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2885 Hi @joseprupi, thank you for the PR! This should replace #2053, right? If yes, could you please close #2053? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5434) Remove unsupported project() transformation from Scala DataStream docs
Vasia Kalavri created FLINK-5434: Summary: Remove unsupported project() transformation from Scala DataStream docs Key: FLINK-5434 URL: https://issues.apache.org/jira/browse/FLINK-5434 Project: Flink Issue Type: Bug Components: Documentation Reporter: Vasia Kalavri The Scala DataStream does not have a project() transformation, yet the docs include it as a supported operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5351) Make the TypeExtractor support functions with more than 2 inputs
Vasia Kalavri created FLINK-5351: Summary: Make the TypeExtractor support functions with more than 2 inputs Key: FLINK-5351 URL: https://issues.apache.org/jira/browse/FLINK-5351 Project: Flink Issue Type: Improvement Components: Gelly, Type Serialization System Reporter: Vasia Kalavri Currently, the The TypeExtractor doesn't support functions with more than 2 inputs. We found that adding such support would be a useful feature for Gelly in FLINK-5097. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods
[ https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-5097. -- Resolution: Fixed Fix Version/s: 1.2.0 > The TypeExtractor is missing input type information in some Graph methods > - > > Key: FLINK-5097 > URL: https://issues.apache.org/jira/browse/FLINK-5097 > Project: Flink > Issue Type: Bug > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Fix For: 1.2.0 > > > The TypeExtractor is called without information about the input type in > {{mapVertices}} and {{mapEdges}} although this information can be easily > retrieved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5311) Write user documentation for BipartiteGraph
[ https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-5311: - Issue Type: Improvement (was: Bug) > Write user documentation for BipartiteGraph > --- > > Key: FLINK-5311 > URL: https://issues.apache.org/jira/browse/FLINK-5311 > Project: Flink > Issue Type: Improvement > Components: Gelly >Reporter: Ivan Mushketyk >Assignee: Ivan Mushketyk > Fix For: 1.2.0 > > > We need to add user documentation. The progress on BipartiteGraph can be > tracked in the following JIRA: > https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5311) Write user documentation for BipartiteGraph
[ https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-5311. -- Resolution: Fixed Fix Version/s: 1.2.0 > Write user documentation for BipartiteGraph > --- > > Key: FLINK-5311 > URL: https://issues.apache.org/jira/browse/FLINK-5311 > Project: Flink > Issue Type: Bug > Components: Gelly >Reporter: Ivan Mushketyk >Assignee: Ivan Mushketyk > Fix For: 1.2.0 > > > We need to add user documentation. The progress on BipartiteGraph can be > tracked in the following JIRA: > https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2984 Thank you @mushketyk. That's OK. I'm merging this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 Done. I'll wait for travis, then merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations
[ https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751890#comment-15751890 ] Vasia Kalavri commented on FLINK-5127: -- It'd be nice to have for 1.2, but I don't know when I'll have time to work on it. I'm hoping this weekend. > Reduce the amount of intermediate data in vertex-centric iterations > --- > > Key: FLINK-5127 > URL: https://issues.apache.org/jira/browse/FLINK-5127 > Project: Flink > Issue Type: Improvement > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > The vertex-centric plan contains a join between the workset (messages) and > the solution set (vertices) that outputs <Vertex, Message> tuples. This > intermediate dataset is then co-grouped with the edges to provide the Pregel > interface directly. > This issue proposes an improvement to reduce the size of this intermediate > dataset. In particular, the vertex state does not have to be attached to all > the output tuples of the join. If we replace the join with a coGroup and use > an `Either` type, we can attach the vertex state to the first tuple only. The > subsequent coGroup can retrieve the vertex state from the first tuple and > correctly expose the Pregel interface. > In my preliminary experiments, I find that this change reduces intermediate > data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 Sure, I can revert using `getMapReturnTypes` , rebase, and merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2984 Hi @mushketyk, thank you for the update! Just a couple of small things and we can merge: - Can you add a note in the beginning of the docs that bipartite graphs are only currently supported in the Gelly Java API? - I would rename the "Graph transformations" section to "Projection". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations
[ https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744495#comment-15744495 ] Vasia Kalavri commented on FLINK-5245: -- My point is not that these features are useless for bipartite graphs, but we have to think whether re-implementing these features specifically for bipartite graphs makes sense, e.g. because general graphs do not supported them or because we can use the knowledge that we have a bipartite graph to make implementation more efficient. For example, projection is a transformation that can only be applied on bipartite graphs. But if all you want to do is get the degrees of your bipartite graph, can you use the available Graph methods? Or can we provide a better way to get the degrees because we know we have a bipartite graph? These are the questions we have to ask for each of these features in the list in my opinion. > Add support for BipartiteGraph mutations > > > Key: FLINK-5245 > URL: https://issues.apache.org/jira/browse/FLINK-5245 > Project: Flink > Issue Type: Improvement > Components: Gelly >Reporter: Ivan Mushketyk >Assignee: Ivan Mushketyk > > Implement methods for adding and removing vertices and edges similarly to > Graph class. > Depends on https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2984 Thank you for the update @mushketyk! I still don't see any link from the Gelly guide page to the bipartite docs though. Can you please add that too? Otherwise people won't be able to find the docs :) As for the images, I think it would be nice to have show how a projection works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations
[ https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742744#comment-15742744 ] Vasia Kalavri commented on FLINK-5245: -- I don't think so. We used to have some simple examples that showcased how to create a graph in this way, but I don't think we really need such methods for the bipartite graph. That said, we should probably go through all the bipartite features and decide whether they are useful, e.g. validator and generators. Do they even make sense for bipartite graphs? Or when do they? > Add support for BipartiteGraph mutations > > > Key: FLINK-5245 > URL: https://issues.apache.org/jira/browse/FLINK-5245 > Project: Flink > Issue Type: Improvement > Components: Gelly >Reporter: Ivan Mushketyk >Assignee: Ivan Mushketyk > > Implement methods for adding and removing vertices and edges similarly to > Graph class. > Depends on https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908218 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. --- End diff -- *apply --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91907989 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. --- End diff -- a relationships => relationships --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908101 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs --- End diff -- *graphs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908174 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes --- End diff -- *a DataSet... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908875 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + + + +Graph Transformations +- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and create a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of node if they are connected in the original graph. + +Gelly supports two sub-types of projections: simple projections and full projections. The only difference between them is what data is associated with edges in the result graph. + +In case of a simple projection each node in the result graph contains a pair of values of bipartite edges that connect nodes in the original graph: --- End diff -- *the case --- If your project is set up for it, you can
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91907816 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. --- End diff -- A single edge => an edge cannot connect to vertices => cannot connect *two vertices --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91907924 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. --- End diff -- a node between a top and a bottom nodes => an edge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908589 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908032 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): --- End diff -- *graphs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91909237 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + + + +Graph Transformations +- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and create a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of node if they are connected in the original graph. + +Gelly supports two sub-types of projections: simple projections and full projections. The only difference between them is what data is associated with edges in the result graph. + +In case of a simple projection each node in the result graph contains a pair of values of bipartite edges that connect nodes in the original graph: + + + + +{% highlight java %} +ExecutionEnvironment env = Execution
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908073 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored --- End diff -- *preserves --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908249 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. --- End diff -- *A BipartiteEdge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908641 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + + + +Graph Transformations +- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and create a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of node if they are connected in the original graph. --- End diff -- *creates --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908511 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented --- End diff -- Don't leave a TODO in the docs. Either state explicitly that BipartiteGraph currently only exists in the Java API or we should make sure to implement the Scala methods before merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908677 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + + + +Graph Transformations +- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and create a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of node if they are connected in the original graph. --- End diff -- *nodes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91908769 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators +nav-parent_id: graphs +nav-pos: 6 +--- + + +* This will be replaced by the TOC +{:toc} + +Bipartite Graph +--- + +A bipartite graph (also called a two-mode graph) is a type of graph where vertices are separated into two disjoint sets. These sets are usually called top and bottom vertices. A single edge in this graph can only connect vertices from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to vertices in the same set. + +Theses graphs have wide application in practice and can be a more natural choice for particular domains. For example to represent authorship of scientific papers top vertices can represent scientific papers while bottom nodes will represent authors. Naturally a node between a top and a bottom nodes would represent an authorship of a particular scientific paper. Another common example for applications of bipartite graphs is a relationships between actors and movies. In this case an edge represents that a particular actor played in a movie. + +Bipartite graph are used instead of regular graphs (one-mode) for the following practical [reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf): + * They preserve more information about a connection between vertices. For example instead of a single link between two researchers in a graph that represents that they authored a paper together a bipartite graph preserve the information about what papers they authored + * Bipartite graph can encode the same information more compactly than one-mode graphs + + + +Graph Representation + + +A `BipartiteGraph` is represented by: + * `DataSet` of top nodes + * `DataSet` of bottom nodes + * `DataSet` of edges between top and bottom nodes + +As in the `Graph` class nodes are represented by the `Vertex` type and the same rules applies to its types and values. + +The graph edges are represented by the `BipartiteEdge` type. An `BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom ID (the ID of the bottom `Vertex`) and an optional value. The main difference between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of different types. Edges with no value have a `NullValue` value type. + + + +{% highlight java %} +BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, Double>(1L, "id1", 0.5); + +Double weight = e.getValue(); // weight = 0.5 +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + +{% top %} + + +Graph Creation +-- + +You can create a `BipartiteGraph` in the following ways: + +* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a `DataSet` of edges: + + + +{% highlight java %} +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +DataSet<Vertex<String, Long>> topVertices = ... + +DataSet<Vertex<String, Long>> bottomVertices = ... + +DataSet<Edge<String, String, Double>> edges = ... + +Graph<String, String, Long, Long, Double> graph = BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env); +{% endhighlight %} + + + +{% highlight scala %} +// TODO: Should be added when Scala interface is implemented +{% endhighlight %} + + + + +Graph Transformations +- + + +* Projection: Projection is a common operation for bipartite graphs that converts a bipartite graph into a regular graph. There are two types of projections: top and bottom projections. Top projection preserves only top nodes in the result graph and create a link between them in a new graph only if there is an intermediate bottom node both top nodes connect to in the original graph. Bottom projection is the opposite to top projection, i.e. only preserves bottom nodes and connects a pair of node if they are connected in the original graph. --- End diff -- Can you add a figure to illustrate a top and a bottom projection? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2984#discussion_r91907660 --- Diff: docs/dev/libs/gelly/bipartite_graph.md --- @@ -0,0 +1,148 @@ +--- +title: Graph Generators --- End diff -- This shouldn't be Graph Generators I believe :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2564 Thank you both for your work @mushketyk and @greghogan! Please, keep in mind that we should always add documentation for every new feature; especially a big one such as supporting a new graph type. We've added the checklist template for each new PR so that we don't forget about it :) Can you please open a JIRA to track that docs for bipartite graphs are missing? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly
[ https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733284#comment-15733284 ] Vasia Kalavri commented on FLINK-1536: -- The idea is what [~greghogan] describes. In a distributed graph processing system, you first have to partition the graph before you perform any computation. The performance of graph algorithms greatly depends on the resulting partitioning. A bad partitioning might assign disproportionally more vertices to one partition thus hurting load balancing or it might partition the graph so that the communication required is too high (or both). Currently, we only support hash partitioning; that is, vertices are randomly assigned to workers using the hash of their id. This strategy has very low overhead and results in good load balancing unless the graphs are skewed. For more details on this problem, I suggest you read some of the papers in the literature linked in the description of the issue [~ivan.mushketyk]. > Graph partitioning operators for Gelly > -- > > Key: FLINK-1536 > URL: https://issues.apache.org/jira/browse/FLINK-1536 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Ivan Mushketyk >Priority: Minor > > Smart graph partitioning can significantly improve the performance and > scalability of graph analysis applications. Depending on the computation > pattern, a graph partitioning algorithm divides the graph into (maybe > overlapping) subgraphs, optimizing some objective. For example, if > communication is performed across graph edges, one might want to minimize the > edges that cross from one partition to another. > The problem of graph partitioning is a well studied problem and several > algorithms have been proposed in the literature. The goal of this project > would be to choose a few existing partitioning techniques and implement the > corresponding graph partitioning operators for Gelly. > Some related literature can be found [here| > http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2564 I would go for `org.apache.flink.graph.bipartite`. I think that `bidirectional` simply suggests that each edge exists in both directions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 Hi @twalthr, thank you so much for looking into this. I'll create an issue for functions with > 2 inputs. I have replaced `createTypeInfo` with `getMapReturnTypes` where possible, but I'm getting a test failure now that I can't figure out. Please see `org.apache.flink.graph.scala.test.operations.GraphCreationWithCsvITCase#testCsvWithMapperValues`. Am I using the `getMapReturnTypes` method properly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2832: [FLINK-4936] [gelly] Operator names for Gelly inputs
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2832 Yes, will do in the following days, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2764 Sorry, no input from me regarding Eclipse. I've given up on it about a year ago ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA
[ https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri closed FLINK-5161. Resolution: Not A Problem > accepting NullValue for VV in Gelly examples and GSA > > > Key: FLINK-5161 > URL: https://issues.apache.org/jira/browse/FLINK-5161 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.3 >Reporter: wouter ligtenberg > > I made this topic a few days ago about EV, i meant VV back then, i don't know > why i suddenly thought about EV and it confused myself. > In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double > is required but never used, wouldn't it be better to change this into a > NullValue? I create a lot of data without Vertex Values and it seems to me > that it's more efficient > I'd like to hear your thoughts on this > [1] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java > [2] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA
[ https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695728#comment-15695728 ] Vasia Kalavri commented on FLINK-5161: -- Hi Wouter, the vertex value caries the distance from every vertex to the source. Since this is a weighted SSSP, this is of double type. Gelly examples favor simplicity and demonstrate functionality. As a user, you should use the library algorithms. And in the library algorithm that you link to, the vertex value is actually parametrized (see the last commit), so you can use any type you like. > accepting NullValue for VV in Gelly examples and GSA > > > Key: FLINK-5161 > URL: https://issues.apache.org/jira/browse/FLINK-5161 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.3 >Reporter: wouter ligtenberg > > I made this topic a few days ago about EV, i meant VV back then, i don't know > why i suddenly thought about EV and it confused myself. > In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double > is required but never used, wouldn't it be better to change this into a > NullValue? I create a lot of data without Vertex Values and it seems to me > that it's more efficient > I'd like to hear your thoughts on this > [1] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java > [2] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-5152) accepting NullValue for EV in Gelly examples
[ https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri closed FLINK-5152. Resolution: Not A Problem > accepting NullValue for EV in Gelly examples > > > Key: FLINK-5152 > URL: https://issues.apache.org/jira/browse/FLINK-5152 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.3 >Reporter: wouter ligtenberg > Fix For: 1.1.3 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > In this gelly example [1] an EdgeValue of Double is required but never used, > wouldn't it be better to change this into a NullValue? I create a lot of data > without Edge Values and it seems to me that it's more efficient > I'd like to hear your thoughts on this > [1] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5152) accepting NullValue for EV in Gelly examples
[ https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692732#comment-15692732 ] Vasia Kalavri commented on FLINK-5152: -- Hi [~otherwise777], this is an example of _weighted_ shortest paths. The edge value is added to the message in the scatter function, thus it cannot be NullValue. If you need a shortest paths implementation that ignores edge values, it should be easy to modify this example to do that. > accepting NullValue for EV in Gelly examples > > > Key: FLINK-5152 > URL: https://issues.apache.org/jira/browse/FLINK-5152 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.3 >Reporter: wouter ligtenberg > Fix For: 1.1.3 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > In this gelly example [1] an EdgeValue of Double is required but never used, > wouldn't it be better to change this into a NullValue? I create a lot of data > without Edge Values and it seems to me that it's more efficient > I'd like to hear your thoughts on this > [1] > https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations
Vasia Kalavri created FLINK-5127: Summary: Reduce the amount of intermediate data in vertex-centric iterations Key: FLINK-5127 URL: https://issues.apache.org/jira/browse/FLINK-5127 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Vasia Kalavri Assignee: Vasia Kalavri The vertex-centric plan contains a join between the workset (messages) and the solution set (vertices) that outputs <Vertex, Message> tuples. This intermediate dataset is then co-grouped with the edges to provide the Pregel interface directly. This issue proposes an improvement to reduce the size of this intermediate dataset. In particular, the vertex state does not have to be attached to all the output tuples of the join. If we replace the join with a coGroup and use an `Either` type, we can attach the vertex state to the first tuple only. The subsequent coGroup can retrieve the vertex state from the first tuple and correctly expose the Pregel interface. In my preliminary experiments, I find that this change reduces intermediate data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly
[ https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686870#comment-15686870 ] Vasia Kalavri commented on FLINK-1536: -- This issue does not refer to bipartite graphs, even though we could extend it. It was initially created as a Google Summer of Code project but it was abandoned. That means that you will have to do some background research for it and we will definitely need a design document or FLIP for it. > Graph partitioning operators for Gelly > -- > > Key: FLINK-1536 > URL: https://issues.apache.org/jira/browse/FLINK-1536 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Priority: Minor > > Smart graph partitioning can significantly improve the performance and > scalability of graph analysis applications. Depending on the computation > pattern, a graph partitioning algorithm divides the graph into (maybe > overlapping) subgraphs, optimizing some objective. For example, if > communication is performed across graph edges, one might want to minimize the > edges that cross from one partition to another. > The problem of graph partitioning is a well studied problem and several > algorithms have been proposed in the literature. The goal of this project > would be to choose a few existing partitioning techniques and implement the > corresponding graph partitioning operators for Gelly. > Some related literature can be found [here| > http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 Hi @greghogan, I was able to fix the problem in `fromDataSet()` and `groupReduceOnEdges()` with `EdgesFunction`. In the rest of the uses, I don't seem to find a way to pass all the input types correctly. The remaining cases all have 3 input types and the `createTypeInfo()` method only accepts two. I have also tried extracting the input types from the wrapping functions, but that didn't work either. Any ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 Thanks! Let me look into these and I'll get back to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2842 @greghogan if the input types are known, we should pass them, yes. What other cases did you find? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2842#discussion_r88926689 --- Diff: flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.test.operations; + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.typeutils.TupleTypeInfo; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.test.TestGraphUtils; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.util.ArrayList; + +public class TypeExtractorTest { + + private Graph<Long, Long, Long> inputGraph; + + + @Before + public void setUp() throws Exception { + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + DataSet<Vertex<Long, Long>> vertices = TestGraphUtils.getLongLongVertexData(env); + DataSet<Edge<Long, Long>> edges = TestGraphUtils.getLongLongEdgeData(env); + inputGraph = Graph.fromDataSet(vertices, edges, env); + } + + public class TestGraphWithGeneric { + + public DataSet<Vertex<K, Tuple2<K, Integer>>> mapVertices(Graph<K, Long, Long> input) { + return input.mapVertices(new VertexMapper()).getVertices(); + } + + public DataSet<Edge<K, Tuple2<K, Integer>>> mapEdges(Graph<K, Long, Long> input) { + return input.mapEdges(new EdgeMapper()).getEdges(); + } + } + + @Test + public void testMapVerticesType() throws Exception { + TestGraphWithGeneric test = new TestGraphWithGeneric<>(); + + // test type extraction in mapVertices + DataSet<Vertex<Long, Tuple2<Long, Integer>>> outVertices = test.mapVertices(inputGraph); + Assert.assertEquals(true, (new TupleTypeInfo(Vertex.class, BasicTypeInfo.LONG_TYPE_INFO, + new TupleTypeInfo<Tuple2<Long, Integer>>(BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO))) + .equals(outVertices.getType())); + } + + @Test + public void testMapEdgesType() throws Exception { + TestGraphWithGeneric test = new TestGraphWithGeneric<>(); + + // test type extraction in mapEdges + DataSet<Edge<Long, Tuple2<Long, Integer>>> outEdges = test.mapEdges(inputGraph); + Assert.assertEquals(true, (new TupleTypeInfo(Edge.class, BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.LONG_TYPE_INFO, + new TupleTypeInfo<Tuple2<Long, Integer>>(BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO))) + .equals(outEdges.getType())); + } + + public static final class VertexMapper implements MapFunction<Vertex<K, Long>, Tuple2<K, Integer>> { + + private final Tuple2<K, Integer> outTuple = new Tuple2<>(); + + @Override + public Tuple2<K, Integer> map(Vertex<K, Long> inputVertex) throws Exception { + outTuple.setField(inputVertex.getId(), 0); + outTuple.setField(inputVertex.getValue().intValue(), 0); --- End diff -- `map()` could even be empty here, since this is never executed. --- If your project is set up for it, you can reply to t
[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2842#discussion_r88926534 --- Diff: flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.test.operations; + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.typeutils.TupleTypeInfo; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.test.TestGraphUtils; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.util.ArrayList; + +public class TypeExtractorTest { + + private Graph<Long, Long, Long> inputGraph; + + + @Before + public void setUp() throws Exception { + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + DataSet<Vertex<Long, Long>> vertices = TestGraphUtils.getLongLongVertexData(env); + DataSet<Edge<Long, Long>> edges = TestGraphUtils.getLongLongEdgeData(env); + inputGraph = Graph.fromDataSet(vertices, edges, env); + } + + public class TestGraphWithGeneric { --- End diff -- Not necessary. I tried to create a minimal example of the reported case. The mapping methods could also be called inside the test methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2842#discussion_r88925478 --- Diff: flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.test.operations; + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.typeutils.TupleTypeInfo; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.test.TestGraphUtils; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.util.ArrayList; + +public class TypeExtractorTest { + + private Graph<Long, Long, Long> inputGraph; + + + @Before + public void setUp() throws Exception { + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); --- End diff -- I don't think it'd make a difference. There is no execution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add is missing input type info...
GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/2842 [FLINK-5097][gelly] Add is missing input type information to TypeExtr⦠I've managed to reproduce @otherwise777's error as reported in the mailing list and added a test case that failed before the change. @twalthr please take a look when you have some time, thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink flink-5097 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2842.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2842 commit ea098c767151b6ba91fe54669a921e6303cd8c4d Author: vasia <va...@apache.org> Date: 2016-11-19T14:35:43Z [FLINK-5097][gelly] Add is missing input type information to TypeExtractor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly
[ https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683366#comment-15683366 ] Vasia Kalavri commented on FLINK-1536: -- Hi [~ivan.mushketyk] afaik, nobody is currently working on this. > Graph partitioning operators for Gelly > -- > > Key: FLINK-1536 > URL: https://issues.apache.org/jira/browse/FLINK-1536 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Priority: Minor > > Smart graph partitioning can significantly improve the performance and > scalability of graph analysis applications. Depending on the computation > pattern, a graph partitioning algorithm divides the graph into (maybe > overlapping) subgraphs, optimizing some objective. For example, if > communication is performed across graph edges, one might want to minimize the > edges that cross from one partition to another. > The problem of graph partitioning is a well studied problem and several > algorithms have been proposed in the literature. The goal of this project > would be to choose a few existing partitioning techniques and implement the > corresponding graph partitioning operators for Gelly. > Some related literature can be found [here| > http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly
[ https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683354#comment-15683354 ] Vasia Kalavri commented on FLINK-2254: -- Hey [~ivan.mushketyk], I would start with the easy ones, i.e. counts and degrees. I would consider the clustering coefficient as a separate case, possibly as a library algorithm. > Add Bipartite Graph Support for Gelly > - > > Key: FLINK-2254 > URL: https://issues.apache.org/jira/browse/FLINK-2254 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Andra Lungu >Assignee: Ivan Mushketyk > Labels: requires-design-doc > > A bipartite graph is a graph for which the set of vertices can be divided > into two disjoint sets such that each edge having a source vertex in the > first set, will have a target vertex in the second set. We would like to > support efficient operations for this type of graphs along with a set of > metrics(http://jponnela.com/web_documents/twomode.pdf). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2564 Thank @mushketyk. @greghogan are you shepherding this PR or shall I? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods
[ https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-5097: - Description: The TypeExtractor is called without information about the input type in {{mapVertices}} and {{mapEdges}} although this information can be easily retrieved. (was: The TypeExtractor is called without information about the input type in {{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although this information can be easily retrieved.) > The TypeExtractor is missing input type information in some Graph methods > - > > Key: FLINK-5097 > URL: https://issues.apache.org/jira/browse/FLINK-5097 > Project: Flink > Issue Type: Bug > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > > The TypeExtractor is called without information about the input type in > {{mapVertices}} and {{mapEdges}} although this information can be easily > retrieved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods
Vasia Kalavri created FLINK-5097: Summary: The TypeExtractor is missing input type information in some Graph methods Key: FLINK-5097 URL: https://issues.apache.org/jira/browse/FLINK-5097 Project: Flink Issue Type: Bug Components: Gelly Reporter: Vasia Kalavri Assignee: Vasia Kalavri The TypeExtractor is called without information about the input type in {{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although this information can be easily retrieved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3551) Sync Scala and Java Streaming Examples
[ https://issues.apache.org/jira/browse/FLINK-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-3551: - Assignee: Lim Chee Hau > Sync Scala and Java Streaming Examples > -- > > Key: FLINK-3551 > URL: https://issues.apache.org/jira/browse/FLINK-3551 > Project: Flink > Issue Type: Sub-task > Components: Examples >Affects Versions: 1.0.0 >Reporter: Stephan Ewen >Assignee: Lim Chee Hau > Fix For: 1.0.1 > > > The Scala Examples lack behind the Java Examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2725: [FLINK-4963] [gelly] Tabulate edge direction for directed...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2725 Hey @greghogan, I didn't have time to review this one, you're a fast merger :) It looks like none of `VertexMertics`, `EdgeMetrics`, `AverageClusteringCoefficient` are mentioned in the gelly docs. Could you please add them in the "Library Methods" section? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2730: [FLINK-4970] [gelly] Parameterize vertex value for SSSP
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2730 Thanks! I agree on `Comparable` for vertex types. Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2731: [FLINK-4934] [gelly] Triadic Census
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2731 Hi @greghogan, do we really need a 12k-line csv and a 32k-line csv to test this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2730: [FLINK-4970] [gelly] Parameterize vertex value for SSSP
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2730 Thanks for the PR @greghogan. Changes look good. Just make sure to also update the docs before merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2670: [FLINK-4204] [gelly] Clean up gelly-examples
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2670 +1 and thanks for the benchmarking link! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2670: [FLINK-4204] [gelly] Clean up gelly-examples
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2670 Hi @greghogan, I really like the cleanup and new organization! Two thoughts: - is the plan to add drivers for all library methods? - shall we remove the `GraphMetrics` example since there is a better driver? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration
[ https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-3888. -- Resolution: Fixed > Custom Aggregator with Convergence can't be registered directly with > DeltaIteration > --- > > Key: FLINK-3888 > URL: https://issues.apache.org/jira/browse/FLINK-3888 > Project: Flink > Issue Type: Bug > Components: Iterations >Reporter: Martin Liesenberg >Assignee: Vasia Kalavri > Fix For: 1.2.0 > > > Contrary to the > [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html] > the method to add an aggregator with a custom convergence criterion to a > DeltaIteration is not exposed directly to DeltaIteration, but can only be > accessed via the {{aggregatorRegistry}}. > Moreover, when registering an aggregator with a custom convergence criterion > and running the program, the following exception appears in the logs: > {noformat} > Error: Cannot use custom convergence criterion with workset iteration. > Workset iterations have implicit convergence criterion where workset is empty. > org.apache.flink.optimizer.CompilerException: Error: Cannot use custom > convergence criterion with workset iteration. Workset iterations have > implicit convergence criterion where workset is empty. > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164) > at > org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898) > at org.apache.flink.api.java.DataSet.collect(DataSet.java:410) > at org.apache.flink.api.java.DataSet.print(DataSet.java:1605) > {noformat} > The issue has been found while discussing FLINK-2926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration
[ https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-3888: - Fix Version/s: 1.2.0 > Custom Aggregator with Convergence can't be registered directly with > DeltaIteration > --- > > Key: FLINK-3888 > URL: https://issues.apache.org/jira/browse/FLINK-3888 > Project: Flink > Issue Type: Bug > Components: Iterations >Reporter: Martin Liesenberg >Assignee: Vasia Kalavri > Fix For: 1.2.0 > > > Contrary to the > [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html] > the method to add an aggregator with a custom convergence criterion to a > DeltaIteration is not exposed directly to DeltaIteration, but can only be > accessed via the {{aggregatorRegistry}}. > Moreover, when registering an aggregator with a custom convergence criterion > and running the program, the following exception appears in the logs: > {noformat} > Error: Cannot use custom convergence criterion with workset iteration. > Workset iterations have implicit convergence criterion where workset is empty. > org.apache.flink.optimizer.CompilerException: Error: Cannot use custom > convergence criterion with workset iteration. Workset iterations have > implicit convergence criterion where workset is empty. > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164) > at > org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898) > at org.apache.flink.api.java.DataSet.collect(DataSet.java:410) > at org.apache.flink.api.java.DataSet.print(DataSet.java:1605) > {noformat} > The issue has been found while discussing FLINK-2926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm
[ https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-4129: - Issue Type: Improvement (was: Bug) > Remove the example HITSAlgorithm > > > Key: FLINK-4129 > URL: https://issues.apache.org/jira/browse/FLINK-4129 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.2.0 > > > {{HITSAlgorithm}} tests for convergence by summing the difference of each > authority score minus the average score. This is simply comparing the sum of > scores against the previous sum of scores which is not a good test for > convergence. > {code} > // count the diff value of sum of authority scores > diffSumAggregator.aggregate(previousAuthAverage - > newAuthorityValue.getValue()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4129) Remove the example HITSAlgorithm
[ https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-4129. -- Resolution: Fixed > Remove the example HITSAlgorithm > > > Key: FLINK-4129 > URL: https://issues.apache.org/jira/browse/FLINK-4129 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.2.0 > > > {{HITSAlgorithm}} tests for convergence by summing the difference of each > authority score minus the average score. This is simply comparing the sum of > scores against the previous sum of scores which is not a good test for > convergence. > {code} > // count the diff value of sum of authority scores > diffSumAggregator.aggregate(previousAuthAverage - > newAuthorityValue.getValue()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm
[ https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-4129: - Summary: Remove the example HITSAlgorithm (was: HITSAlgorithm should test for element-wise convergence) > Remove the example HITSAlgorithm > > > Key: FLINK-4129 > URL: https://issues.apache.org/jira/browse/FLINK-4129 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.2.0 > > > {{HITSAlgorithm}} tests for convergence by summing the difference of each > authority score minus the average score. This is simply comparing the sum of > scores against the previous sum of scores which is not a good test for > convergence. > {code} > // count the diff value of sum of authority scores > diffSumAggregator.aggregate(previousAuthAverage - > newAuthorityValue.getValue()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2663: [FLINK-4129] [gelly] HITSAlgorithm should test for elemen...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2663 Thanks, will merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2606: [FLINK-3888] Allow custom convergence criterion in delta ...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2606 Thank you for the review @greghogan! I have addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2606: [FLINK-3888] Allow custom convergence criterion in...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/2606#discussion_r83632808 --- Diff: flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java --- @@ -1513,14 +1513,21 @@ private void finalizeWorksetIteration(IterationDescriptor descr) { String convAggName = aggs.getConvergenceCriterionAggregatorName(); ConvergenceCriterion convCriterion = aggs.getConvergenceCriterion(); - + if (convCriterion != null || convAggName != null) { - throw new CompilerException("Error: Cannot use custom convergence criterion with workset iteration. Workset iterations have implicit convergence criterion where workset is empty."); + if (convCriterion == null) { + throw new CompilerException("Error: Convergence criterion aggregator set, but criterion is null."); + } + if (convAggName == null) { + throw new CompilerException("Error: Aggregator convergence criterion set, but aggregator is null."); + } + + syncConfig.setConvergenceCriterion(convAggName, convCriterion); } headConfig.addIterationAggregator(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME, new LongSumAggregator()); syncConfig.addIterationAggregator(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME, new LongSumAggregator()); - syncConfig.setConvergenceCriterion(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME, new WorksetEmptyConvergenceCriterion()); + syncConfig.setDefaultConvergenceCriterion(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME, new WorksetEmptyConvergenceCriterion()); --- End diff -- Sure that's possible, but each iteration will have its own TaskConfig. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4129) HITSAlgorithm should test for element-wise convergence
[ https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579933#comment-15579933 ] Vasia Kalavri commented on FLINK-4129: -- I think having two HITS examples could be confusing to users. Is this implementation showcasing some feature that no other example is or could we simply remove it in favor of the HITS driver? > HITSAlgorithm should test for element-wise convergence > -- > > Key: FLINK-4129 > URL: https://issues.apache.org/jira/browse/FLINK-4129 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Priority: Minor > > {{HITSAlgorithm}} tests for convergence by summing the difference of each > authority score minus the average score. This is simply comparing the sum of > scores against the previous sum of scores which is not a good test for > convergence. > {code} > // count the diff value of sum of authority scores > diffSumAggregator.aggregate(previousAuthAverage - > newAuthorityValue.getValue()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1091) Allow joins with the solution set using key selectors
[ https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579877#comment-15579877 ] Vasia Kalavri commented on FLINK-1091: -- Hi [~neelesh77], I'm not working on this. I've unassigned the issue. Do you have a use-case where you need this? > Allow joins with the solution set using key selectors > - > > Key: FLINK-1091 > URL: https://issues.apache.org/jira/browse/FLINK-1091 > Project: Flink > Issue Type: Sub-task > Components: Iterations >Reporter: Vasia Kalavri >Priority: Minor > Labels: easyfix, features > > Currently, the solution set may only be joined with using tuple field > positions. > A possible solution can be providing explicit functions "joinWithSolution" > and "coGroupWithSolution" to make sure the keys used are valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1091) Allow joins with the solution set using key selectors
[ https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri updated FLINK-1091: - Assignee: (was: Vasia Kalavri) > Allow joins with the solution set using key selectors > - > > Key: FLINK-1091 > URL: https://issues.apache.org/jira/browse/FLINK-1091 > Project: Flink > Issue Type: Sub-task > Components: Iterations >Reporter: Vasia Kalavri >Priority: Minor > Labels: easyfix, features > > Currently, the solution set may only be joined with using tuple field > positions. > A possible solution can be providing explicit functions "joinWithSolution" > and "coGroupWithSolution" to make sure the keys used are valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2606: Allow custom convergence criterion in delta iterat...
GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/2606 Allow custom convergence criterion in delta iterations As discussed in the Jira issue, this PR contains the following changes: 1. use `TaskConfig.setConvergenceCriterion()` to set the custom, user-defined convergence criterion (like in the case of bulk iteration) 2. add a new method `TaskConfig.setDefaultConvergeCriterion()` to handle the default empty workset convergence â3. check both criteria in `IterationSynchronizationSinkTask.checkForConvergence()â` 4. expose the custom convergence criterion in `DeltaIteration` It also contains some minor cleanup and corresponding changes in the `CollectionExecutor`. The iteration docs already state that custom convergence is possible, so no update needed there ;) You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink flink-3888 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2606.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2606 commit 41327034ae57d99fdde74bf24838d19b5aee31f3 Author: vasia <va...@apache.org> Date: 2016-10-05T11:49:20Z [FLINK-3888] allow registering a custom convergence criterion in delta iterations commit e7ffc368f8542d3afe823b86c23e91379a03e21b Author: vasia <va...@apache.org> Date: 2016-10-05T12:06:47Z [FLINK-3888] cleanups in iterations and aggregators code commit 3b1a5e55686f665b1c1bb90943b0f853e71eae82 Author: vasia <va...@apache.org> Date: 2016-10-06T20:25:43Z [FLINK-3888] add delta convergence criterion in the CollectionExecutor commit 9f41af544eecae0c37ae9470f4ff26f19b5dbdc0 Author: vasia <va...@apache.org> Date: 2016-10-06T21:00:38Z [FLINK-3888] add ITCases for delta custom convergence --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration
[ https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri reassigned FLINK-3888: Assignee: Vasia Kalavri > Custom Aggregator with Convergence can't be registered directly with > DeltaIteration > --- > > Key: FLINK-3888 > URL: https://issues.apache.org/jira/browse/FLINK-3888 > Project: Flink > Issue Type: Bug > Components: Iterations >Reporter: Martin Liesenberg >Assignee: Vasia Kalavri > > Contrary to the > [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html] > the method to add an aggregator with a custom convergence criterion to a > DeltaIteration is not exposed directly to DeltaIteration, but can only be > accessed via the {{aggregatorRegistry}}. > Moreover, when registering an aggregator with a custom convergence criterion > and running the program, the following exception appears in the logs: > {noformat} > Error: Cannot use custom convergence criterion with workset iteration. > Workset iterations have implicit convergence criterion where workset is empty. > org.apache.flink.optimizer.CompilerException: Error: Cannot use custom > convergence criterion with workset iteration. Workset iterations have > implicit convergence criterion where workset is empty. > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164) > at > org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898) > at org.apache.flink.api.java.DataSet.collect(DataSet.java:410) > at org.apache.flink.api.java.DataSet.print(DataSet.java:1605) > {noformat} > The issue has been found while discussing FLINK-2926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2564 Thanks for the update @mushketyk and for the review @greghogan. I agree with your suggestions. For the type parameters I would go for `<KT, KB, VVT, VVB, EV>`. Let me know if there's any other issue you'd like my opinion on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2587: [FLINK-4729] [gelly] Use optional VertexCentric CombineFu...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/2587 Great cleanup! Thanks @greghogan. +1 to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration
[ https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543256#comment-15543256 ] Vasia Kalavri commented on FLINK-3888: -- We shouldn't override the default convergence criterion of the delta iteration. When the workset is empty there's no work to do. Instead, if a custom criterion is provided, the convergence condition should be the disjunction of the two. > Custom Aggregator with Convergence can't be registered directly with > DeltaIteration > --- > > Key: FLINK-3888 > URL: https://issues.apache.org/jira/browse/FLINK-3888 > Project: Flink > Issue Type: Bug > Components: Iterations >Reporter: Martin Liesenberg > > Contrary to the > [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html] > the method to add an aggregator with a custom convergence criterion to a > DeltaIteration is not exposed directly to DeltaIteration, but can only be > accessed via the {{aggregatorRegistry}}. > Moreover, when registering an aggregator with a custom convergence criterion > and running the program, the following exception appears in the logs: > {noformat} > Error: Cannot use custom convergence criterion with workset iteration. > Workset iterations have implicit convergence criterion where workset is empty. > org.apache.flink.optimizer.CompilerException: Error: Cannot use custom > convergence criterion with workset iteration. Workset iterations have > implicit convergence criterion where workset is empty. > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198) > at > org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164) > at > org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898) > at org.apache.flink.api.java.DataSet.collect(DataSet.java:410) > at org.apache.flink.api.java.DataSet.print(DataSet.java:1605) > {noformat} > The issue has been found while discussing FLINK-2926 -- This message was sent by Atlassian JIRA (v6.3.4#6332)