[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/877 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-120654787 Merging... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118921231 Updated PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118912040 Hi @shghatge, @andralungu! I left one comment that I think is quite serious. Apart from that, I had also left a minor comment on the gelly-guide changes in the last review. Once these are fixed, you have my +1 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33951120 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java --- @@ -367,6 +394,20 @@ public void join(Vertex vertex, Edge edge, Collectorf0") --- End diff -- I think this annotation is wrong. It's the first field that's forwarded (the edge source). If that's the case, can you please investigate why your tests don't catch this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118891421 I see the requirements have been fulfilled here. If no objections, I'd like to merge this by the end of the week :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-11815 Updated PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118180990 In my view it's a straight-forward configuration option. If the user wants to propagate changes in this direction, they set the parameter. Also, I'm seeing that Gelly is starting to have way too many examples. Examples are there to show basic functionality, no every single feature of the API. Library methods are there to make common analysis easy to run and docs are there to help with configuration options and customization. i.e. we should focus on library and docs and only add examples to demonstrate major new additions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118179979 Hihi! Shivani is updating the PR on the fly :) Reminds me of myself back in the day ;) If I may tease: If these GSA parameters did not have an example, I would have asked for one. So my two cents: shouldn't we, instead of deleting completely, replace the example with a better one? IncrementalGSASSSP :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-118179278 Hi @shghatge! This looks good in general. Apart from my small inline comment, I have an objection regarding the added example. If I understand correctly, this example tries to compute all vertices that are reachable from each vertex. Is that correct? This is a computation that will cause the state to explode really fast, even for moderately large graphs. I would suggest that you remove the example completely. The docs should be enough to show the available functionality :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33826389 --- Diff: docs/libs/gelly_guide.md --- @@ -734,6 +737,24 @@ public static final class Apply { {% endhighlight %} +The following example illustrates the usage of the edge direction option. Vertices update their values to contain a list of all their in-neighbors. --- End diff -- What do you mean by "contain a list of all their in-neighbors"? Were you planning to add an example where vertices assign their neighborhoods as their values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-117785914 LGTM :) @vasia ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-117730019 Updated PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/877#issuecomment-117540482 Hi @shghatge , This PR looks very nice; I added some minor inline comments to make it look even nicer :) One "major" problem: you have an example and a test for it; however, you don't have tests for the GSAConfiguration itself; check https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/test/java/org/apache/flink/graph/test/VertexCentricConfigurationITCase.java You need to make sure that given no direction; the iteration works as before and if you give it direction IN or ALL, it behaves as it is supposed to. Apart from that, this looks almost spotless. One other minor thing; when you make the PR, in order to have it uber - clean you should squash your commits. In this case, you have 3 commits, so: git rebase -i HEAD~3 should do the trick Then leave pick for the first commit; squash for the others and let the instructions guide you (Ctrl-x; Y, etc...). Then force push. The common practice afterwards (i.e. after you address my comments here) is to leave it as a new commit (don't squash!). So, first squash these 3, then fix the minor issues and commit again. Awesome work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33657394 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAExistenceOfPaths.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.example.utils.PathExistenceData; +import org.apache.flink.graph.gsa.ApplyFunction; +import org.apache.flink.graph.gsa.GatherFunction; +import org.apache.flink.graph.gsa.Neighbor; +import org.apache.flink.graph.gsa.SumFunction; +import org.apache.flink.graph.gsa.GSAConfiguration; + + +import java.util.HashSet; + +/** + * This example implements a program in which we find out the vertices for which there exists a path from given vertex + * + * The edges input file is expected to contain one edge per line, with long IDs and long values + * The vertices input file is expected to contain one vertex per line with long IDs and no value + * If no arguments are provided, the example runs with default data for this example + */ +public class GSAExistenceOfPaths implements ProgramDescription { + + @SuppressWarnings("serial") --- End diff -- not sure you need this annotation... could you run the example without it and see whether you get a warning? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33657193 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAExistenceOfPaths.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.example.utils.PathExistenceData; +import org.apache.flink.graph.gsa.ApplyFunction; +import org.apache.flink.graph.gsa.GatherFunction; +import org.apache.flink.graph.gsa.Neighbor; +import org.apache.flink.graph.gsa.SumFunction; +import org.apache.flink.graph.gsa.GSAConfiguration; + + +import java.util.HashSet; + +/** + * This example implements a program in which we find out the vertices for which there exists a path from given vertex + * + * The edges input file is expected to contain one edge per line, with long IDs and long values + * The vertices input file is expected to contain one vertex per line with long IDs and no value + * If no arguments are provided, the example runs with default data for this example + */ +public class GSAExistenceOfPaths implements ProgramDescription { + + @SuppressWarnings("serial") + public static void main(String[] args) throws Exception { + + if(!parseParameters(args)) { + return; + } + + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSet> edges = getEdgesDataSet(env); + DataSet>> vertices = getVerticesDataSet(env); + + Graph, Long> graph = Graph.fromTupleDataSet(vertices, edges, env); + + + GSAConfiguration parameters = new GSAConfiguration(); + parameters.setDirection(EdgeDirection.IN); + // Execute the GSA iteration + Graph, Long> result = graph.runGatherSumApplyIteration(new GetReachableVertices(), + new FindAllReachableVertices(), + new UpdateReachableVertices(), + maxIterations, parameters); + + // Extract the vertices as the result + DataSet>> reachableVertices = result.getVertices(); + + // emit result + if (fileOutput) { + reachableVertices.writeAsCsv(outputPath, "\n", ","); + + env.execute("GSA Path Existence"); + } else { + reachableVertices.print(); + } + + } + + // + // Path Existence UDFs + // + + @SuppressWarnings("serial") + private static final class GetReachableVertices extends GatherFunction, Long, HashSet> { + + @Override + public HashSet gather(Neighbor, Long> neighbor) { + return neighbor.getNeighborValue()
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33657118 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAExistenceOfPaths.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example; + +import org.apache.flink.api.common.ProgramDescription; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.example.utils.PathExistenceData; +import org.apache.flink.graph.gsa.ApplyFunction; +import org.apache.flink.graph.gsa.GatherFunction; +import org.apache.flink.graph.gsa.Neighbor; +import org.apache.flink.graph.gsa.SumFunction; +import org.apache.flink.graph.gsa.GSAConfiguration; + + +import java.util.HashSet; + +/** + * This example implements a program in which we find out the vertices for which there exists a path from given vertex + * + * The edges input file is expected to contain one edge per line, with long IDs and long values + * The vertices input file is expected to contain one vertex per line with long IDs and no value + * If no arguments are provided, the example runs with default data for this example + */ +public class GSAExistenceOfPaths implements ProgramDescription { + + @SuppressWarnings("serial") + public static void main(String[] args) throws Exception { + + if(!parseParameters(args)) { + return; + } + + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSet> edges = getEdgesDataSet(env); + DataSet>> vertices = getVerticesDataSet(env); + + Graph, Long> graph = Graph.fromTupleDataSet(vertices, edges, env); + + + GSAConfiguration parameters = new GSAConfiguration(); + parameters.setDirection(EdgeDirection.IN); + // Execute the GSA iteration + Graph, Long> result = graph.runGatherSumApplyIteration(new GetReachableVertices(), + new FindAllReachableVertices(), --- End diff -- leave just one blank line here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/877#discussion_r33657016 --- Diff: docs/libs/gelly_guide.md --- @@ -693,6 +693,9 @@ Currently, the following parameters can be specified: * Number of Vertices: Accessing the total number of vertices within the iteration. This property can be set using the `setOptNumVertices()` method. The number of vertices can then be accessed in the gather, sum and/or apply functions by using the `getNumberOfVertices()` method. If the option is not set in the configuration, this method will return -1. +* Neighbor Direction: By Default values are gathered from the out neighbors of the Vertex. This can be modified --- End diff -- default without capital D, set direction should be written without capital s and in between inverted commas (i.e. `setDirection()`) It would be nice to have an example in the docs, showing users how to play with this option; check the number of vertices example for inspiration :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2141] Allow GSA's Gather to perform thi...
GitHub user shghatge opened a pull request: https://github.com/apache/flink/pull/877 [FLINK-2141] Allow GSA's Gather to perform this operation in more than one direction Added the setDirection() and getDirection() methods to GSAConfiguration.java Added functionality to gather values from chosen neighbors instead of only OUT neighbors Added a simple example which generates the list of vertices to whom their exists a path from the vertex as value of the vertex Added util classes for the example and also added a test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shghatge/flink vertex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #877 commit d88329024394cef17a3454d20e6905f7d7363004 Author: Shivani Date: 2015-07-01T01:03:58Z [FLINK-2141][gelly] Allow GSA's Gather to perform this operation in more than one direction commit b2a29d6763820a2be6b1a71f0224fda83861c2c4 Author: Shivani Date: 2015-07-01T01:10:41Z [FLINK-2141][gelly] Allow GSA's Gather to perform this operation in more than one direction --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---