[GitHub] flink pull request: [FLINK-2149][gelly] Simplified Jaccard Example
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/770#issuecomment-113752297 Meging... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2149][gelly] Simplified Jaccard Example
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/770 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2149][gelly] Simplified Jaccard Example
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/770#issuecomment-111807874 PR updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2149][gelly] Simplified Jaccard Example
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/770#discussion_r32374939 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/JaccardSimilarityMeasure.java --- @@ -66,34 +63,47 @@ public static void main(String [] args) throws Exception { DataSetEdgeLong, Double edges = getEdgesDataSet(env); - GraphLong, NullValue, Double graph = Graph.fromDataSet(edges, env); + GraphLong, HashSetLong, Double graph = Graph.fromDataSet(edges, + new MapFunctionLong, HashSetLong() { - DataSetVertexLong, HashSetLong verticesWithNeighbors = - graph.groupReduceOnEdges(new GatherNeighbors(), EdgeDirection.ALL); + @Override + public HashSetLong map(Long id) throws Exception { + HashSetLong neighbors = new HashSetLong(); + neighbors.add(id); - GraphLong, HashSetLong, Double graphWithVertexValues = Graph.fromDataSet(verticesWithNeighbors, edges, env); + return new HashSetLong(neighbors); + } + }, env); - // the edge value will be the Jaccard similarity coefficient(number of common neighbors/ all neighbors) - DataSetTuple3Long, Long, Double edgesWithJaccardWeight = graphWithVertexValues.getTriplets() - .map(new WeighEdgesMapper()); + // create the set of neighbors + DataSetTuple2Long, HashSetLong computedNeighbors = + graph.reduceOnNeighbors(new GatherNeighbors(), EdgeDirection.ALL); - DataSetEdgeLong, Double result = graphWithVertexValues.joinWithEdges(edgesWithJaccardWeight, - new MapFunctionTuple2Double, Double, Double() { + // join with the vertices to update the node values + DataSetVertexLong, HashSetLong verticesWithNeighbors = + graph.joinWithVertices(computedNeighbors, new MapFunctionTuple2HashSetLong, HashSetLong, + HashSetLong() { @Override - public Double map(Tuple2Double, Double value) throws Exception { - return value.f1; + public HashSetLong map(Tuple2HashSetLong, HashSetLong tuple2) throws Exception { + return tuple2.f1; } - }).getEdges(); + }).getVertices(); + + GraphLong, HashSetLong, Double graphWithVertexValues = Graph.fromDataSet(verticesWithNeighbors, edges, env); --- End diff -- joinWithVertices can give you the Graph directly :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2149][gelly] Simplified Jaccard Example
GitHub user andralungu opened a pull request: https://github.com/apache/flink/pull/770 [FLINK-2149][gelly] Simplified Jaccard Example This PR simplifies Gelly's Jaccard example by using the more efficient reduceOnNeighbors rather than groupReduceOnNeighbors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andralungu/flink jaccardImprovement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/770.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #770 commit 0e189c6af9a5fb80b4999a60a431d60cf95944db Author: andralungu lungu.an...@gmail.com Date: 2015-06-03T14:12:16Z [FLINK-2149][gelly] Simplified Jaccard Example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---