[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-25 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-96271897
  
Hey @andralungu!
I see your point regarding `iterateOn...` and `reduce...`. Let's leave it 
like this for now.
Only small thing that maybe I didn't explain well, is operate directly on 
the values for the combinable versions. I meant there's no need to pass the 
vertex keys at all.
This is a small change, I can make it before merging.
Otherwise, everything looks good, I'll merge right after #408. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-20 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-94572014
  
Apart from the fact that I don't agree with those two methods having the 
same name, everything else was updated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-20 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-94461019
  
Hi @vasia ,

The inline comments have been addressed, I still have the big comment to 
look at :)

One thing(or two ^^) that could be discussed: 
- the inconsistency between the iterateOn* and the reduce* method names 
comes from the fact that the two methods have different behaviour: reduce takes 
the elements two by two and aggregates them while iterateOn uses an iterator to 
go through the edges. Hence the different names. 
Imagine something like this:
Tuple2K, EdgeK, EV iterateOnEdges(Tuple2K, EdgeK, EV firstEdge, 
Tuple2K, EdgeK, EV secondEdge);
when you're not iterating on anything(or vice-versa reduceOnEdges(Iterable 
edges)).

- I know ReduceEdgesFunction is not a pretty name, but I don't have another 
one, ReduceFunction is already taken :P. When you think of a better name, tell 
me and I can do the refactoring.
 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-17 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/576#discussion_r28594318
  
--- Diff: docs/gelly_guide.md ---
@@ -282,25 +290,28 @@ The following code will collect the out-edges for 
each vertex and apply the `Sel
 {% highlight java %}
 GraphLong, Long, Double graph = ...
 
-DataSetTuple2Long, Double minWeights = graph.reduceOnEdges(
+DataSetTuple2Long, Double minWeights = graph.groupReduceOnEdges(
new SelectMinWeight(), EdgeDirection.OUT);
 
 // user-defined function to select the minimum weight
-static final class SelectMinWeight implements EdgesFunctionLong, Double, 
Tuple2Long, Double {
+static final class SelectMinWeightNeighbor implements 
EdgesFunctionWithVertexValueLong, Long, Long, Tuple2Long, Long {
 
-public Tuple2Long, Double iterateEdges(IterableTuple2Long, 
EdgeLong, Double edges) {
+   @Override
+   public void iterateEdges(VertexLong, Long v,
+   IterableEdgeLong, Long edges, 
CollectorTuple2Long, Long out) throws Exception {
 
-long minWeight = Double.MAX_VALUE;
-long vertexId = -1;
+   long weight = Long.MAX_VALUE;
+   long minNeighborId = 0;
 
-for (Tuple2Long, EdgeLong, Double edge: edges) {
-if (edge.f1.getValue()  weight) {
-weight = edge.f1.getValue();
-vertexId = edge.f0;
-}
-return new Tuple2Long, Double(vertexId, minWeight);
-}
-}
+   for (EdgeLong, Long edge: edges) {
+   if (edge.getValue()  weight) {
+   weight = edge.getValue();
+   minNeighborId = edge.getTarget();
+   }
+   }
+   out.collect(new Tuple2Long, Long(v.getId(), 
minNeighborId));
+   }
+   }
--- End diff --

I think it's confusing what this example is doing now. The description says 
that each vertex selects the min weight among its edges, yet you modified it to 
return the ID of the target neighbor of that edge. Do you want to show 
something that I'm missing here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-17 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/576#discussion_r28595115
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/EdgeDirection.java
 ---
@@ -20,10 +20,10 @@
 
 /**
  * The EdgeDirection is used to select a node's neighborhood
- * by the {@link Graph#reduceOnEdges(EdgesFunction, EdgeDirection)},
- * {@link Graph#reduceOnEdges(EdgesFunctionWithVertexValue, 
EdgeDirection)},
- * {@link Graph#reduceOnNeighbors(NeighborsFunction, EdgeDirection)} and
- * {@link Graph#reduceOnNeighbors(NeighborsFunctionWithVertexValue, 
EdgeDirection)}
+ * by the {@link Graph#groupReduceOnEdges(EdgesFunction, EdgeDirection)},
--- End diff --

isn't it used in both cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-17 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/576#discussion_r28594425
  
--- Diff: docs/gelly_guide.md ---
@@ -282,25 +290,28 @@ The following code will collect the out-edges for 
each vertex and apply the `Sel
 {% highlight java %}
 GraphLong, Long, Double graph = ...
 
-DataSetTuple2Long, Double minWeights = graph.reduceOnEdges(
+DataSetTuple2Long, Double minWeights = graph.groupReduceOnEdges(
new SelectMinWeight(), EdgeDirection.OUT);
 
 // user-defined function to select the minimum weight
-static final class SelectMinWeight implements EdgesFunctionLong, Double, 
Tuple2Long, Double {
+static final class SelectMinWeightNeighbor implements 
EdgesFunctionWithVertexValueLong, Long, Long, Tuple2Long, Long {
 
-public Tuple2Long, Double iterateEdges(IterableTuple2Long, 
EdgeLong, Double edges) {
+   @Override
+   public void iterateEdges(VertexLong, Long v,
+   IterableEdgeLong, Long edges, 
CollectorTuple2Long, Long out) throws Exception {
 
-long minWeight = Double.MAX_VALUE;
-long vertexId = -1;
+   long weight = Long.MAX_VALUE;
+   long minNeighborId = 0;
 
-for (Tuple2Long, EdgeLong, Double edge: edges) {
-if (edge.f1.getValue()  weight) {
-weight = edge.f1.getValue();
-vertexId = edge.f0;
-}
-return new Tuple2Long, Double(vertexId, minWeight);
-}
-}
+   for (EdgeLong, Long edge: edges) {
+   if (edge.getValue()  weight) {
+   weight = edge.getValue();
+   minNeighborId = edge.getTarget();
+   }
+   }
+   out.collect(new Tuple2Long, Long(v.getId(), 
minNeighborId));
+   }
+   }
--- End diff --

Also, the description says `SelectMinWeight` but the method name is 
`SelectMinWeightNeighbor`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-17 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-93996607
  
Hi @andralungu! Thanks for the quick update ^^
The result looks good, but I have a few more comments/suggestions (apart 
from the inline ones).

- +1 for the `groupReduce*` implementations. These look great :-)

- Regarding the combinable versions, I think that the current API is not 
very intuitive. I understand that you did it like this because you were 
restricted by the return type of the `ReduceFunction`. I think that it would be 
better if we simplify these to operate directly on the values, instead of 
tuples with nested Edge and Vertex data.
For example, say you want to compute the sum of all out-neighbors of a 
vertex. The value types to combine inside the reduce function should be the 
same as the vertex value type, instead of a Tuple3.
In the case of reducing on edges, the type should be the same as the edge 
value type.
Thus, I would make `reduceOnEdges` return `DataSetTuple2K, EV` and 
`reduceOnNeighbors` return `DataSetTuple2K, VV`, i.e. one value per vertex 
ID.

-  There is some inconsistency in the naming of the methods, for example 
the user-defined methods in the group case are called `iterateOn...` and in the 
reduce case they are called `reduce...`. We should stick to one of these :-) 
Also, maybe we could find a better name for `ReduceEdgesFunction` and 
`ReduceNeighborsFunction`. I don't have a suggestion right now, but I'll think 
about it.

-  Finally, there are several warnings in your code for unchecked and raw 
types. Could you turn on warning notifications in your IDE and try to add 
suppress annotations to get of them?

Let me know what you think and whether you have any questions!

Thanks again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-13 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-92270780
  
Hello @vasia ,

I updated the PR to use reduce where possible(for the functions which 
return one value per vertex).
Let me know if there is anything I missed.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-07 Thread andralungu
GitHub user andralungu opened a pull request:

https://github.com/apache/flink/pull/576

[FLINK-1758][gelly] Neighborhood Methods Extensions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andralungu/flink extendNeighborhood

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #576


commit e5f6571c7cb05d96aac84530beb52cef968243d7
Author: andralungu lungu.an...@gmail.com
Date:   2015-04-07T20:29:18Z

[FLINK-1758][gelly] Neighborhood Methods Extensions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1758][gelly] Neighborhood Methods Exten...

2015-04-07 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/576#issuecomment-90749788
  
Thanks for the PR @andralungu!

The idea for this issue is to (a) use a combinable reduce for the simple 
(without vertex value) reduceOn* methods, because it will be more efficient and 
(b) offer groupReduceOn* versions with Collectors to allow returning more than 
one value per vertex.
I see you have implemented (b) here, but not (a).

To be more specific, the groupReduceOn* methods would be the ones with the 
Collectors. Basically, you'll just have to rename what you have already 
implemented. The reduceOn* methods will only return one value per vertex, like 
they do currently, but internally use the reduce operator (instead of 
reduceGroup).

For example, if you want to compute the sum of edge values, you can use the 
`reduceOnEdges` method, same as you currently do. And if you want to compute 
the Jaccard coefficient, you need to output multiple values per vertex and for 
that you can use `groupReduceOnEdges`.

Please let me know if you have any questions! Thanks again :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---