from:"vasia"

[jira] [Commented] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-12-01 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274340#comment-16274340
 ] 

Vasia Kalavri commented on FLINK-5506:
--

I only had a quick look at the code; I will need to re-read the paper to make 
sure the algorithm semantics are correct with the following:
I believe the problem is line 147 in {{CommunityDetection.java}}. The code 
assumes we have received only positive scores, while negative ones are indeed 
possible. Changing this line to {{double maxScore = -Double.MAX_VALUE;}} should 
fix it.

> Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
> -
>
> Key: FLINK-5506
> URL: https://issues.apache.org/jira/browse/FLINK-5506
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.4, 1.3.2, 1.4.1
>Reporter: Miguel E. Coimbra
>  Labels: easyfix, newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Reporting this here as per Vasia's advice.
> I am having the following problem while trying out the 
> org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
> (Java).
> Specs: JDK 1.8.0_102 x64
> Apache Flink: 1.1.4
> Suppose I have a very small (I tried an example with 38 vertices as well) 
> dataset stored in a tab-separated file 3-vertex.tsv:
> {code}
> #id1 id2 score
> 010
> 020
> 030
> {code}
> This is just a central vertex with 3 neighbors (disconnected between 
> themselves).
> I am loading the dataset and executing the algorithm with the following code:
> {code}
> // Load the data from the .tsv file.
> final DataSet<Tuple3<Long, Long, Double>> edgeTuples = 
> env.readCsvFile(inputPath)
> .fieldDelimiter("\t") // node IDs are separated by spaces
> .ignoreComments("#")  // comments start with "%"
> .types(Long.class, Long.class, Double.class);
> // Generate a graph and add reverse edges (undirected).
> final Graph<Long, Long, Double> graph = Graph.fromTupleDataSet(edgeTuples, 
> new MapFunction<Long, Long>() {
> private static final long serialVersionUID = 8713516577419451509L;
> public Long map(Long value) {
> return value;
> }
> },
> env).getUndirected();
> // CommunityDetection parameters.
> final double hopAttenuationDelta = 0.5d;
> final int iterationCount = 10;
> // Prepare and trigger the execution.
> DataSet<Vertex<Long, Long>> vs = graph.run(new 
> org.apache.flink.graph.library.CommunityDetection(iterationCount, 
> hopAttenuationDelta)).getVertices();
> vs.print();
> {code}
> Running this code throws the following exception (check the bold line):
> {code}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
> at 
> org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
> at 
> org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
> at org.apache.flink.runtime.op

[GitHub] flink pull request #4179: [FLINK-6989] [gelly] Refactor examples with Output...

2017-06-28 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/4179#discussion_r124514726
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/parameter/Parameter.java
 ---
@@ -40,6 +40,15 @@
String getUsage();
 
/**
+* A hidden parameter is parsed from the command-line configuration but 
is
+* not printed in the usage string. This can be used for power-user 
options
+* not displayed to the general user.
--- End diff --

This sounds interesting. Can you give an example when this might be useful?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3433: [FLINK-5911] [gelly] Command-line parameters

2017-03-26 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/3433#discussion_r108067501
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/parameter/Parameter.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.drivers.parameter;
+
+import org.apache.flink.api.java.utils.ParameterTool;
+
+/**
+ * Encapsulates the usage and configuration of a command-line parameter.
+ *
+ * @param  parameter value type
+ */
+public interface Parameter {
+
+   /**
+* An informal usage string. Parameter names are prefixed with "--".
+*
+* Optional parameters are enclosed by "[" and "]".
+*
+* Generic values are represented by all-caps with specific values 
enclosed
+* by "" and "".
+*
+* @return command-line usage string
+*/
+   String getParameterization();
--- End diff --

Why not `getUsage()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3431: [FLINK-5910] [gelly] Framework for Gelly examples

2017-03-10 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/3431
  
Thanks! Then, it's good to go from my side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results

2017-03-09 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/3434
  
Thanks @greghogan. Looks good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results

2017-03-06 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/3434
  
Does `AnalyticResult` also need a method like `toVerboseString()`? Could we 
replace both with a e.g. `Result` type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2910) Reorganize / Combine Gelly tests

2017-03-06 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896927#comment-15896927
 ] 

Vasia Kalavri commented on FLINK-2910:
--

I think that's a good idea [~greghogan].

> Reorganize / Combine Gelly tests
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results

2017-03-06 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/3434
  
Thanks for the clarification @greghogan. +1 from me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3434: [FLINK-5909] [gelly] Interface for GraphAlgorithm results

2017-03-04 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/3434
  
Hi @greghogan, thank you for the PR.

I didn't spot anything that needs fixing, but I'm wondering what's the 
motivation to add these interfaces. I see how `toVerboseString()` is useful, 
but not really why `AnalyticResult` is needed. Also, why introduce  
`UnaryResult`, `BinaryResult`, and `TertiaryResult` instead of simply using 
tuple types?

I also see that this PR contains no changes to the docs and that the 
current 1.3-SNAPSHOT docs already reflect the changes of this PR. What am I 
missing here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Comment Edited] (FLINK-4949) Refactor Gelly driver inputs

2017-03-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696
 ] 

Vasia Kalavri edited comment on FLINK-4949 at 3/2/17 6:05 PM:
--

Thank you [~greghogan]. I can review during the weekend.


was (Author: vkalavri):
Thanks you [~greghogan]. I can review during the weekend.

> Refactor Gelly driver inputs
> 
>
> Key: FLINK-4949
> URL: https://issues.apache.org/jira/browse/FLINK-4949
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.3.0
>
>
> The Gelly drivers started as simple wrappers around library algorithms but 
> have grown to handle a matrix of input sources while often running multiple 
> algorithms and analytics with custom parameterization.
> This ticket will refactor the sourcing of the input graph into separate 
> classes for CSV files and RMat which will simplify the inclusion of new data 
> sources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-4949) Refactor Gelly driver inputs

2017-03-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696
 ] 

Vasia Kalavri commented on FLINK-4949:
--

Thanks you [~greghogan]. I can review during the weekend.

> Refactor Gelly driver inputs
> 
>
> Key: FLINK-4949
> URL: https://issues.apache.org/jira/browse/FLINK-4949
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.3.0
>
>
> The Gelly drivers started as simple wrappers around library algorithms but 
> have grown to handle a matrix of input sources while often running multiple 
> algorithms and analytics with custom parameterization.
> This ticket will refactor the sourcing of the input graph into separate 
> classes for CSV files and RMat which will simplify the inclusion of new data 
> sources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #2733: [FLINK-4896] [gelly] PageRank algorithm for directed grap...

2017-02-28 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2733
  
Thanks for the results @greghogan! Updating the docs would be nice, 
otherwise +1.
Do you think we should move existing implementations in the examples (there 
are still relevant to demonstrate iteration APIs) and keep this one as the only 
library method since it's the fastest and more general one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2910) Reorganize / Combine Gelly tests

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Summary: Reorganize / Combine Gelly tests  (was: Combine tests for binary 
graph operators)

> Reorganize / Combine Gelly tests
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Description: 
- Some tests are spread out in different classes could be combined as well, 
e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
for neighborhood methods, etc.
- Testing a binary operator (i.e. union and difference) is done in two similar 
tests: one is testing the expected vertex set and one the expected edge set. 
This can be combined in one test per operator using 
{{LocalCollectionOutputFormat<>}}

  was:Atm, testing a binary operator (i.e. union and difference) is done in two 
similar tests: one is testing the expected vertex set and one the expected edge 
set. This can be combined in one test per operator using 
{{LocalCollectionOutputFormat<>}}


> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Fix Version/s: 1.3.0

> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> Atm, testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889618#comment-15889618
 ] 

Vasia Kalavri commented on FLINK-2910:
--

Thanks for the heads-up [~uce]. Yes, this is still relevant. I will update it.
[~mju] are you still planning to work on this?

> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Atm, testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #2885: [FLINK-1707] Affinity propagation

2017-02-13 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2885
  
Hi @joseprupi,
could you please rebase this PR? Right now it's not clear which are your 
changes in order to review.
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2017-02-09 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5127:
-
Affects Version/s: 1.1.0
   1.2.0

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.3.0
>
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs <Vertex, Message> tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2017-02-09 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5127:
-
Fix Version/s: 1.3.0

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.3.0
>
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs <Vertex, Message> tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example

2017-01-30 Thread Vasia Kalavri (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vasia Kalavri commented on  FLINK-1526 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Add Minimum Spanning Tree library method and example  
 
 
 
 
 
 
 
 
 
 
Hi Xingcan Cui, the problem is that currently you cannot have an iteration (e.g. vertex-centric) inside a for-loop or a while-loop. So, your pseudocode won't work (well, it will, but only for very small inputs). I believe "no value updates" refers to no vertex values changing. Where did you see this? 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example

2017-01-29 Thread Vasia Kalavri (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vasia Kalavri commented on  FLINK-1526 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Add Minimum Spanning Tree library method and example  
 
 
 
 
 
 
 
 
 
 
Hi Xingcan Cui, thank you for your interest in this issue. As you can see in the comments history, contributors have had problems completing this task without support for for-loop iterations. Are you planning to take a different approach? Could you describe how you're planning to proceed? Thanks! 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] [Created] (FLINK-5597) Improve the LocalClusteringCoefficient documentation

2017-01-20 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5597:


 Summary: Improve the LocalClusteringCoefficient documentation
 Key: FLINK-5597
 URL: https://issues.apache.org/jira/browse/FLINK-5597
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Reporter: Vasia Kalavri


The LocalClusteringCoefficient usage section should explain what is the 
algorithm output and how to retrieve the actual local clustering coefficient 
scores from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2885: [FLINK-1707] Affinity propagation

2017-01-19 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2885
  
Hi @joseprupi, thank you for the PR! This should replace #2053, right? If 
yes, could you please close #2053? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-5434) Remove unsupported project() transformation from Scala DataStream docs

2017-01-10 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5434:


 Summary: Remove unsupported project() transformation from Scala 
DataStream docs
 Key: FLINK-5434
 URL: https://issues.apache.org/jira/browse/FLINK-5434
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Vasia Kalavri


The Scala DataStream does not have a project() transformation, yet the docs 
include it as a supported operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5351) Make the TypeExtractor support functions with more than 2 inputs

2016-12-16 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5351:


 Summary: Make the TypeExtractor support functions with more than 2 
inputs
 Key: FLINK-5351
 URL: https://issues.apache.org/jira/browse/FLINK-5351
 Project: Flink
  Issue Type: Improvement
  Components: Gelly, Type Serialization System
Reporter: Vasia Kalavri


Currently, the The TypeExtractor doesn't support functions with more than 2 
inputs. We found that adding such support would be a useful feature for Gelly 
in FLINK-5097.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-5097.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> The TypeExtractor is missing input type information in some Graph methods
> -
>
> Key: FLINK-5097
> URL: https://issues.apache.org/jira/browse/FLINK-5097
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> The TypeExtractor is called without information about the input type in 
> {{mapVertices}} and {{mapEdges}} although this information can be easily 
> retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-5311) Write user documentation for BipartiteGraph

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5311:
-
Issue Type: Improvement  (was: Bug)

> Write user documentation for BipartiteGraph
> ---
>
> Key: FLINK-5311
> URL: https://issues.apache.org/jira/browse/FLINK-5311
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
> Fix For: 1.2.0
>
>
> We need to add user documentation. The progress on BipartiteGraph can be 
> tracked in the following JIRA:
> https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-5311) Write user documentation for BipartiteGraph

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-5311.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> Write user documentation for BipartiteGraph
> ---
>
> Key: FLINK-5311
> URL: https://issues.apache.org/jira/browse/FLINK-5311
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
> Fix For: 1.2.0
>
>
> We need to add user documentation. The progress on BipartiteGraph can be 
> tracked in the following JIRA:
> https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph

2016-12-16 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2984
  
Thank you @mushketyk. That's OK. I'm merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-12-15 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
Done. I'll wait for travis, then merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2016-12-15 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751890#comment-15751890
 ] 

Vasia Kalavri commented on FLINK-5127:
--

It'd be nice to have for 1.2, but I don't know when I'll have time to work on 
it. I'm hoping this weekend.

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs <Vertex, Message> tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-12-15 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
Sure, I can revert using `getMapReturnTypes` , rebase, and merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph

2016-12-15 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2984
  
Hi @mushketyk, thank you for the update!
Just a couple of small things and we can merge:
- Can you add a note in the beginning of the docs that bipartite graphs are 
only currently supported in the Gelly Java API?
- I would rename the "Graph transformations" section to "Projection".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-12 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744495#comment-15744495
 ] 

Vasia Kalavri commented on FLINK-5245:
--

My point is not that these features are useless for bipartite graphs, but we 
have to think whether re-implementing these features specifically for bipartite 
graphs makes sense, e.g. because general graphs do not supported them or 
because we can use the knowledge that we have a bipartite graph to make 
implementation more efficient. For example, projection is a transformation that 
can only be applied on bipartite graphs. But if all you want to do is get the 
degrees of your bipartite graph, can you use the available Graph methods? Or 
can we provide a better way to get the degrees because we know we have a 
bipartite graph? These are the questions we have to ask for each of these 
features in the list in my opinion.

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph

2016-12-12 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2984
  
Thank you for the update @mushketyk! I still don't see any link from the 
Gelly guide page to the bipartite docs though. Can you please add that too? 
Otherwise people won't be able to find the docs :)
As for the images, I think it would be nice to have show how a projection 
works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-12 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742744#comment-15742744
 ] 

Vasia Kalavri commented on FLINK-5245:
--

I don't think so. We used to have some simple examples that showcased how to 
create a graph in this way, but I don't think we really need such methods for 
the bipartite graph. That said, we should probably go through all the bipartite 
features and decide whether they are useful, e.g. validator and generators. Do 
they even make sense for bipartite graphs? Or when do they?

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908218
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
--- End diff --

*apply


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91907989
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
--- End diff --

a relationships => relationships


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908101
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
--- End diff --

*graphs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908174
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
--- End diff --

*a DataSet...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908875
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+
+
+Graph Transformations
+-
+
+
+* Projection: Projection is a common operation for 
bipartite graphs that converts a bipartite graph into a regular graph. There 
are two types of projections: top and bottom projections. Top projection 
preserves only top nodes in the result graph and create a link between them in 
a new graph only if there is an intermediate bottom node both top nodes connect 
to in the original graph. Bottom projection is the opposite to top projection, 
i.e. only preserves bottom nodes and connects a pair of node if they are 
connected in the original graph.
+
+Gelly supports two sub-types of projections: simple projections and full 
projections. The only difference between them is what data is associated with 
edges in the result graph.
+
+In case of a simple projection each node in the result graph contains a 
pair of values of bipartite edges that connect nodes in the original graph:
--- End diff --

*the case


---
If your project is set up for it, you can

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91907816
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
--- End diff --

A single edge => an edge
cannot connect to vertices => cannot connect *two vertices


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91907924
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
--- End diff --

 a node between a top and a bottom nodes => an edge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908589
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
--- End diff --

same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908032
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
--- End diff --

*graphs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91909237
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+
+
+Graph Transformations
+-
+
+
+* Projection: Projection is a common operation for 
bipartite graphs that converts a bipartite graph into a regular graph. There 
are two types of projections: top and bottom projections. Top projection 
preserves only top nodes in the result graph and create a link between them in 
a new graph only if there is an intermediate bottom node both top nodes connect 
to in the original graph. Bottom projection is the opposite to top projection, 
i.e. only preserves bottom nodes and connects a pair of node if they are 
connected in the original graph.
+
+Gelly supports two sub-types of projections: simple projections and full 
projections. The only difference between them is what data is associated with 
edges in the result graph.
+
+In case of a simple projection each node in the result graph contains a 
pair of values of bipartite edges that connect nodes in the original graph:
+
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = Execution

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908073
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
--- End diff --

*preserves


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908249
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
--- End diff --

*A BipartiteEdge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908641
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+
+
+Graph Transformations
+-
+
+
+* Projection: Projection is a common operation for 
bipartite graphs that converts a bipartite graph into a regular graph. There 
are two types of projections: top and bottom projections. Top projection 
preserves only top nodes in the result graph and create a link between them in 
a new graph only if there is an intermediate bottom node both top nodes connect 
to in the original graph. Bottom projection is the opposite to top projection, 
i.e. only preserves bottom nodes and connects a pair of node if they are 
connected in the original graph.
--- End diff --

*creates


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908511
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
--- End diff --

Don't leave a TODO in the docs. Either state explicitly that BipartiteGraph 
currently only exists in the Java API or we should make sure to implement the 
Scala methods before merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908677
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+
+
+Graph Transformations
+-
+
+
+* Projection: Projection is a common operation for 
bipartite graphs that converts a bipartite graph into a regular graph. There 
are two types of projections: top and bottom projections. Top projection 
preserves only top nodes in the result graph and create a link between them in 
a new graph only if there is an intermediate bottom node both top nodes connect 
to in the original graph. Bottom projection is the opposite to top projection, 
i.e. only preserves bottom nodes and connects a pair of node if they are 
connected in the original graph.
--- End diff --

*nodes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91908769
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
+nav-parent_id: graphs
+nav-pos: 6
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+Bipartite Graph
+---
+
+A bipartite graph (also called a two-mode graph) is a type of graph where 
vertices are separated into two disjoint sets. These sets are usually called 
top and bottom vertices. A single edge in this graph can only connect vertices 
from opposite sets (i.e. bottom vertex to top vertex) and cannot connect to 
vertices in the same set.
+
+Theses graphs have wide application in practice and can be a more natural 
choice for particular domains. For example to represent authorship of 
scientific papers top vertices can represent scientific papers while bottom 
nodes will represent authors. Naturally a node between a top and a bottom nodes 
would represent an authorship of a particular scientific paper. Another common 
example for applications of bipartite graphs is a relationships between actors 
and movies. In this case an edge represents that a particular actor played in a 
movie.
+
+Bipartite graph are used instead of regular graphs (one-mode) for the 
following practical 
[reasons](http://www.complexnetworks.fr/wp-content/uploads/2011/01/socnet07.pdf):
+ * They preserve more information about a connection between vertices. For 
example instead of a single link between two researchers in a graph that 
represents that they authored a paper together a bipartite graph preserve the 
information about what papers they authored
+ * Bipartite graph can encode the same information more compactly than 
one-mode graphs
+ 
+
+
+Graph Representation
+
+
+A `BipartiteGraph` is represented by:
+ * `DataSet` of top nodes
+ * `DataSet` of bottom nodes
+ * `DataSet` of edges between top and bottom nodes
+
+As in the `Graph` class nodes are represented by the `Vertex` type and the 
same rules applies to its types and values.
+
+The graph edges are represented by the `BipartiteEdge` type. An 
`BipartiteEdge` is defined by a top ID (the ID of the top `Vertex`), a bottom 
ID (the ID of the bottom `Vertex`) and an optional value. The main difference 
between the `Edge` and `BipartiteEdge` is that IDs of nodes it links can be of 
different types. Edges with no value have a `NullValue` value type.
+
+
+
+{% highlight java %}
+BipartiteEdge<Long, String, Double> e = new BipartiteEdge<Long, String, 
Double>(1L, "id1", 0.5);
+
+Double weight = e.getValue(); // weight = 0.5
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+{% top %}
+
+
+Graph Creation
+--
+
+You can create a `BipartiteGraph` in the following ways:
+
+* from a `DataSet` of top vertices, a `DataSet` of bottom vertices and a 
`DataSet` of edges:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+DataSet<Vertex<String, Long>> topVertices = ...
+
+DataSet<Vertex<String, Long>> bottomVertices = ...
+
+DataSet<Edge<String, String, Double>> edges = ...
+
+Graph<String, String, Long, Long, Double> graph = 
BipartiteGraph.fromDataSet(topVertices, bottomVertices, edges, env);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// TODO: Should be added when Scala interface is implemented
+{% endhighlight %}
+
+
+
+
+Graph Transformations
+-
+
+
+* Projection: Projection is a common operation for 
bipartite graphs that converts a bipartite graph into a regular graph. There 
are two types of projections: top and bottom projections. Top projection 
preserves only top nodes in the result graph and create a link between them in 
a new graph only if there is an intermediate bottom node both top nodes connect 
to in the original graph. Bottom projection is the opposite to top projection, 
i.e. only preserves bottom nodes and connects a pair of node if they are 
connected in the original graph.
--- End diff --

Can you add a figure to illustrate a top and a bottom projection?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2984: [FLINK-5311] Add user documentation for bipartite ...

2016-12-12 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2984#discussion_r91907660
  
--- Diff: docs/dev/libs/gelly/bipartite_graph.md ---
@@ -0,0 +1,148 @@
+---
+title: Graph Generators
--- End diff --

This shouldn't be Graph Generators I believe :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class

2016-12-09 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2564
  
Thank you both for your work @mushketyk and @greghogan!
Please, keep in mind that we should always add documentation for every new 
feature; especially a big one such as supporting a new graph type. We've added 
the checklist template for each new PR so that we don't forget about it :)
Can you please open a JIRA to track that docs for bipartite graphs are 
missing? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-12-08 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733284#comment-15733284
 ] 

Vasia Kalavri commented on FLINK-1536:
--

The idea is what [~greghogan] describes. In a distributed graph processing 
system, you first have to partition the graph before you perform any 
computation. The performance of graph algorithms greatly depends on the 
resulting partitioning. A bad partitioning might assign disproportionally more 
vertices to one partition thus hurting load balancing or it might partition the 
graph so that the communication required is too high (or both). Currently, we 
only support hash partitioning; that is, vertices are randomly assigned to 
workers using the hash of their id. This strategy has very low overhead and 
results in good load balancing unless the graphs are skewed. For more details 
on this problem, I suggest you read some of the papers in the literature linked 
in the description of the issue [~ivan.mushketyk].

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class

2016-12-08 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2564
  
I would go for `org.apache.flink.graph.bipartite`. I think that 
`bidirectional` simply suggests that each edge exists in both directions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-12-07 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
Hi @twalthr, thank you so much for looking into this. I'll create an issue 
for functions with > 2 inputs.
I have replaced `createTypeInfo` with `getMapReturnTypes` where possible, 
but I'm getting a test failure now that I can't figure out. Please see 
`org.apache.flink.graph.scala.test.operations.GraphCreationWithCsvITCase#testCsvWithMapperValues`.
 Am I using the `getMapReturnTypes` method properly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2832: [FLINK-4936] [gelly] Operator names for Gelly inputs

2016-12-06 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2832
  
Yes, will do in the following days, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation

2016-11-29 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2764
  
Sorry, no input from me regarding Eclipse. I've given up on it about a year 
ago ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA

2016-11-25 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-5161.

Resolution: Not A Problem

> accepting NullValue for VV in Gelly examples and GSA
> 
>
> Key: FLINK-5161
> URL: https://issues.apache.org/jira/browse/FLINK-5161
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
>
> I made this topic a few days ago about EV, i meant VV back then, i don't know 
> why i suddenly thought about EV and it confused myself. 
> In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double 
> is required but never used, wouldn't it be better to change this into a 
> NullValue? I create a lot of data without Vertex Values and it seems to me 
> that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java
> [2] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA

2016-11-25 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695728#comment-15695728
 ] 

Vasia Kalavri commented on FLINK-5161:
--

Hi Wouter,
the vertex value caries the distance from every vertex to the source. Since 
this is a weighted SSSP, this is of double type.
Gelly examples favor simplicity and demonstrate functionality. As a user, you 
should use the library algorithms. And in the library algorithm that you link 
to, the vertex value is actually parametrized (see the last commit), so you can 
use any type you like.



> accepting NullValue for VV in Gelly examples and GSA
> 
>
> Key: FLINK-5161
> URL: https://issues.apache.org/jira/browse/FLINK-5161
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
>
> I made this topic a few days ago about EV, i meant VV back then, i don't know 
> why i suddenly thought about EV and it confused myself. 
> In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double 
> is required but never used, wouldn't it be better to change this into a 
> NullValue? I create a lot of data without Vertex Values and it seems to me 
> that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java
> [2] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-5152) accepting NullValue for EV in Gelly examples

2016-11-24 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-5152.

Resolution: Not A Problem

> accepting NullValue for EV in Gelly examples
> 
>
> Key: FLINK-5152
> URL: https://issues.apache.org/jira/browse/FLINK-5152
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
> Fix For: 1.1.3
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In this gelly example [1] an EdgeValue of Double is required but never used, 
> wouldn't it be better to change this into a NullValue? I create a lot of data 
> without Edge Values and it seems to me that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5152) accepting NullValue for EV in Gelly examples

2016-11-24 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692732#comment-15692732
 ] 

Vasia Kalavri commented on FLINK-5152:
--

Hi [~otherwise777],
this is an example of _weighted_ shortest paths. The edge value is added to the 
message in the scatter function, thus it cannot be NullValue. If you need a 
shortest paths implementation that ignores edge values, it should be easy to 
modify this example to do that.

> accepting NullValue for EV in Gelly examples
> 
>
> Key: FLINK-5152
> URL: https://issues.apache.org/jira/browse/FLINK-5152
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
> Fix For: 1.1.3
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In this gelly example [1] an EdgeValue of Double is required but never used, 
> wouldn't it be better to change this into a NullValue? I create a lot of data 
> without Edge Values and it seems to me that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2016-11-22 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5127:


 Summary: Reduce the amount of intermediate data in vertex-centric 
iterations
 Key: FLINK-5127
 URL: https://issues.apache.org/jira/browse/FLINK-5127
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri


The vertex-centric plan contains a join between the workset (messages) and the 
solution set (vertices) that outputs <Vertex, Message> tuples. This 
intermediate dataset is then co-grouped with the edges to provide the Pregel 
interface directly.

This issue proposes an improvement to reduce the size of this intermediate 
dataset. In particular, the vertex state does not have to be attached to all 
the output tuples of the join. If we replace the join with a coGroup and use an 
`Either` type, we can attach the vertex state to the first tuple only. The 
subsequent coGroup can retrieve the vertex state from the first tuple and 
correctly expose the Pregel interface.

In my preliminary experiments, I find that this change reduces intermediate 
data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-11-22 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686870#comment-15686870
 ] 

Vasia Kalavri commented on FLINK-1536:
--

This issue does not refer to bipartite graphs, even though we could extend it. 
It was initially created as a Google Summer of Code project but it was 
abandoned. That means that you will have to do some background research for it 
and we will definitely need a design document or FLIP for it.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-11-22 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
Hi @greghogan,
I was able to fix the problem in `fromDataSet()` and `groupReduceOnEdges()` 
with `EdgesFunction`. In the rest of the uses, I don't seem to find a way to 
pass all the input types correctly. The remaining cases all have 3 input types 
and the `createTypeInfo()` method only accepts two. I have also tried 
extracting the input types from the wrapping functions, but that didn't work 
either. Any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-11-21 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
Thanks! Let me look into these and I'll get back to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2842: [FLINK-5097][gelly] Add missing input type information to...

2016-11-21 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2842
  
@greghogan if the input types are known, we should pass them, yes. What 
other cases did you find?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...

2016-11-21 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2842#discussion_r88926689
  
--- Diff: 
flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.test.operations;
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.test.TestGraphUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.ArrayList;
+
+public class TypeExtractorTest {
+
+   private Graph<Long, Long, Long> inputGraph;
+
+
+   @Before
+   public void setUp() throws Exception {
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   DataSet<Vertex<Long, Long>> vertices = 
TestGraphUtils.getLongLongVertexData(env);
+   DataSet<Edge<Long, Long>> edges = 
TestGraphUtils.getLongLongEdgeData(env);
+   inputGraph = Graph.fromDataSet(vertices, edges, env);
+   }
+
+   public class TestGraphWithGeneric {
+
+   public DataSet<Vertex<K, Tuple2<K, Integer>>> 
mapVertices(Graph<K, Long, Long> input) {
+   return input.mapVertices(new 
VertexMapper()).getVertices();
+   }
+
+   public DataSet<Edge<K, Tuple2<K, Integer>>> mapEdges(Graph<K, 
Long, Long> input) {
+   return input.mapEdges(new EdgeMapper()).getEdges();
+   }
+   }
+
+   @Test
+   public void testMapVerticesType() throws Exception {
+   TestGraphWithGeneric test = new TestGraphWithGeneric<>();
+
+   // test type extraction in mapVertices
+   DataSet<Vertex<Long, Tuple2<Long, Integer>>> outVertices = 
test.mapVertices(inputGraph);
+   Assert.assertEquals(true, (new TupleTypeInfo(Vertex.class, 
BasicTypeInfo.LONG_TYPE_INFO,
+   new TupleTypeInfo<Tuple2<Long, 
Integer>>(BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO)))
+   .equals(outVertices.getType()));
+   }
+
+   @Test
+   public void testMapEdgesType() throws Exception {
+   TestGraphWithGeneric test = new TestGraphWithGeneric<>();
+
+   // test type extraction in mapEdges
+   DataSet<Edge<Long, Tuple2<Long, Integer>>> outEdges = 
test.mapEdges(inputGraph);
+   Assert.assertEquals(true, (new TupleTypeInfo(Edge.class, 
BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.LONG_TYPE_INFO,
+   new TupleTypeInfo<Tuple2<Long, 
Integer>>(BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO)))
+   .equals(outEdges.getType()));
+   }
+
+   public static final class VertexMapper implements 
MapFunction<Vertex<K, Long>, Tuple2<K, Integer>> {
+
+   private final Tuple2<K, Integer> outTuple = new Tuple2<>();
+
+   @Override
+   public Tuple2<K, Integer> map(Vertex<K, Long> inputVertex) 
throws Exception {
+   outTuple.setField(inputVertex.getId(),  0);
+   outTuple.setField(inputVertex.getValue().intValue(),  
0);
--- End diff --

`map()` could even be empty here, since this is never executed.


---
If your project is set up for it, you can reply to t

[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...

2016-11-21 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2842#discussion_r88926534
  
--- Diff: 
flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.test.operations;
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.test.TestGraphUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.ArrayList;
+
+public class TypeExtractorTest {
+
+   private Graph<Long, Long, Long> inputGraph;
+
+
+   @Before
+   public void setUp() throws Exception {
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   DataSet<Vertex<Long, Long>> vertices = 
TestGraphUtils.getLongLongVertexData(env);
+   DataSet<Edge<Long, Long>> edges = 
TestGraphUtils.getLongLongEdgeData(env);
+   inputGraph = Graph.fromDataSet(vertices, edges, env);
+   }
+
+   public class TestGraphWithGeneric {
--- End diff --

Not necessary. I tried to create a minimal example of the reported case. 
The mapping methods could also be called inside the test methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add missing input type informa...

2016-11-21 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2842#discussion_r88925478
  
--- Diff: 
flink-libraries/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/TypeExtractorTest.java
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.test.operations;
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.test.TestGraphUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.ArrayList;
+
+public class TypeExtractorTest {
+
+   private Graph<Long, Long, Long> inputGraph;
+
+
+   @Before
+   public void setUp() throws Exception {
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
--- End diff --

I don't think it'd make a difference. There is no execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2842: [FLINK-5097][gelly] Add is missing input type info...

2016-11-21 Thread vasia

GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/2842

[FLINK-5097][gelly] Add is missing input type information to TypeExtrâ¦

I've managed to reproduce @otherwise777's error as reported in the mailing 
list and added a test case that failed before the change. @twalthr please take 
a look when you have some time, thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink flink-5097

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2842.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2842


commit ea098c767151b6ba91fe54669a921e6303cd8c4d
Author: vasia <va...@apache.org>
Date:   2016-11-19T14:35:43Z

[FLINK-5097][gelly] Add is missing input type information to TypeExtractor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-11-21 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683366#comment-15683366
 ] 

Vasia Kalavri commented on FLINK-1536:
--

Hi [~ivan.mushketyk]
afaik, nobody is currently working on this.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-11-21 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683354#comment-15683354
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hey [~ivan.mushketyk],
I would start with the easy ones, i.e. counts and degrees. I would consider the 
clustering coefficient as a separate case, possibly as a library algorithm.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class

2016-11-21 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2564
  
Thank @mushketyk. @greghogan are you shepherding this PR or shall I? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-11-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5097:
-
Description: The TypeExtractor is called without information about the 
input type in {{mapVertices}} and {{mapEdges}} although this information can be 
easily retrieved.  (was: The TypeExtractor is called without information about 
the input type in {{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although 
this information can be easily retrieved.)

> The TypeExtractor is missing input type information in some Graph methods
> -
>
> Key: FLINK-5097
> URL: https://issues.apache.org/jira/browse/FLINK-5097
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> The TypeExtractor is called without information about the input type in 
> {{mapVertices}} and {{mapEdges}} although this information can be easily 
> retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-11-18 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5097:


 Summary: The TypeExtractor is missing input type information in 
some Graph methods
 Key: FLINK-5097
 URL: https://issues.apache.org/jira/browse/FLINK-5097
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri


The TypeExtractor is called without information about the input type in 
{{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although this information 
can be easily retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3551) Sync Scala and Java Streaming Examples

2016-11-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3551:
-
Assignee: Lim Chee Hau

> Sync Scala and Java Streaming Examples
> --
>
> Key: FLINK-3551
> URL: https://issues.apache.org/jira/browse/FLINK-3551
> Project: Flink
>  Issue Type: Sub-task
>  Components: Examples
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Lim Chee Hau
> Fix For: 1.0.1
>
>
> The Scala Examples lack behind the Java Examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2725: [FLINK-4963] [gelly] Tabulate edge direction for directed...

2016-11-08 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2725
  
Hey @greghogan, I didn't have time to review this one, you're a fast merger 
:)
It looks like none of `VertexMertics`, `EdgeMetrics`, 
`AverageClusteringCoefficient` are mentioned in the gelly docs. Could you 
please add them in the "Library Methods" section? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2730: [FLINK-4970] [gelly] Parameterize vertex value for SSSP

2016-11-07 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2730
  
Thanks! I agree on `Comparable` for vertex types. Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2731: [FLINK-4934] [gelly] Triadic Census

2016-11-06 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2731
  
Hi @greghogan,
do we really need a 12k-line csv and a 32k-line csv to test this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2730: [FLINK-4970] [gelly] Parameterize vertex value for SSSP

2016-11-05 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2730
  
Thanks for the PR @greghogan. Changes look good. Just make sure to also 
update the docs before merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2670: [FLINK-4204] [gelly] Clean up gelly-examples

2016-10-22 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2670
  
+1 and thanks for the benchmarking link!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2670: [FLINK-4204] [gelly] Clean up gelly-examples

2016-10-21 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2670
  
Hi @greghogan,
I really like the cleanup and new organization!
Two thoughts:
- is the plan to add drivers for all library methods?
- shall we remove the `GraphMetrics` example since there is a better driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3888.
--
Resolution: Fixed

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3888:
-
Fix Version/s: 1.2.0

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-4129:
-
Issue Type: Improvement  (was: Bug)

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-4129.
--
Resolution: Fixed

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-4129:
-
Summary: Remove the example HITSAlgorithm  (was: HITSAlgorithm should test 
for element-wise convergence)

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2663: [FLINK-4129] [gelly] HITSAlgorithm should test for elemen...

2016-10-21 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2663
  
Thanks, will merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2606: [FLINK-3888] Allow custom convergence criterion in delta ...

2016-10-17 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2606
  
Thank you for the review @greghogan! I have addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2606: [FLINK-3888] Allow custom convergence criterion in...

2016-10-17 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/2606#discussion_r83632808
  
--- Diff: 
flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java
 ---
@@ -1513,14 +1513,21 @@ private void 
finalizeWorksetIteration(IterationDescriptor descr) {

String convAggName = 
aggs.getConvergenceCriterionAggregatorName();
ConvergenceCriterion convCriterion = 
aggs.getConvergenceCriterion();
-   
+
if (convCriterion != null || convAggName != null) {
-   throw new CompilerException("Error: Cannot use custom 
convergence criterion with workset iteration. Workset iterations have implicit 
convergence criterion where workset is empty.");
+   if (convCriterion == null) {
+   throw new CompilerException("Error: Convergence 
criterion aggregator set, but criterion is null.");
+   }
+   if (convAggName == null) {
+   throw new CompilerException("Error: Aggregator 
convergence criterion set, but aggregator is null.");
+   }
+
+   syncConfig.setConvergenceCriterion(convAggName, 
convCriterion);
}


headConfig.addIterationAggregator(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME,
 new LongSumAggregator());

syncConfig.addIterationAggregator(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME,
 new LongSumAggregator());
-   
syncConfig.setConvergenceCriterion(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME,
 new WorksetEmptyConvergenceCriterion());
+   
syncConfig.setDefaultConvergenceCriterion(WorksetEmptyConvergenceCriterion.AGGREGATOR_NAME,
 new WorksetEmptyConvergenceCriterion());
--- End diff --

Sure that's possible, but each iteration will have its own TaskConfig.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-4129) HITSAlgorithm should test for element-wise convergence

2016-10-16 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579933#comment-15579933
 ] 

Vasia Kalavri commented on FLINK-4129:
--

I think having two HITS examples could be confusing to users. Is this 
implementation showcasing some feature that no other example is or could we 
simply remove it in favor of the HITS driver?

> HITSAlgorithm should test for element-wise convergence
> --
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Priority: Minor
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1091) Allow joins with the solution set using key selectors

2016-10-16 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579877#comment-15579877
 ] 

Vasia Kalavri commented on FLINK-1091:
--

Hi [~neelesh77],
I'm not working on this. I've unassigned the issue. Do you have a use-case 
where you need this?

> Allow joins with the solution set using key selectors
> -
>
> Key: FLINK-1091
> URL: https://issues.apache.org/jira/browse/FLINK-1091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Iterations
>Reporter: Vasia Kalavri
>Priority: Minor
>  Labels: easyfix, features
>
> Currently, the solution set may only be joined with using tuple field 
> positions.
> A possible solution can be providing explicit functions "joinWithSolution" 
> and "coGroupWithSolution" to make sure the keys used are valid. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1091) Allow joins with the solution set using key selectors

2016-10-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-1091:
-
Assignee: (was: Vasia Kalavri)

> Allow joins with the solution set using key selectors
> -
>
> Key: FLINK-1091
> URL: https://issues.apache.org/jira/browse/FLINK-1091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Iterations
>Reporter: Vasia Kalavri
>Priority: Minor
>  Labels: easyfix, features
>
> Currently, the solution set may only be joined with using tuple field 
> positions.
> A possible solution can be providing explicit functions "joinWithSolution" 
> and "coGroupWithSolution" to make sure the keys used are valid. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request #2606: Allow custom convergence criterion in delta iterat...

2016-10-07 Thread vasia

GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/2606

Allow custom convergence criterion in delta iterations

As discussed in the Jira issue, this PR contains the following changes:
1. use `TaskConfig.setConvergenceCriterion()` to set the custom, 
user-defined convergence criterion (like in the case of bulk iteration)
2. add a new method `TaskConfig.setDefaultConvergeCriterion()` to handle 
the default empty workset convergence
â3. check both criteria in 
`IterationSynchronizationSinkTask.checkForConvergence()â`
4. expose the custom convergence criterion in `DeltaIteration`
It also contains some minor cleanup and corresponding changes in the 
`CollectionExecutor`.
The iteration docs already state that custom convergence is possible, so no 
update needed there ;)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink flink-3888

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2606


commit 41327034ae57d99fdde74bf24838d19b5aee31f3
Author: vasia <va...@apache.org>
Date:   2016-10-05T11:49:20Z

[FLINK-3888] allow registering a custom convergence criterion in delta 
iterations

commit e7ffc368f8542d3afe823b86c23e91379a03e21b
Author: vasia <va...@apache.org>
Date:   2016-10-05T12:06:47Z

[FLINK-3888] cleanups in iterations and aggregators code

commit 3b1a5e55686f665b1c1bb90943b0f853e71eae82
Author: vasia <va...@apache.org>
Date:   2016-10-06T20:25:43Z

[FLINK-3888] add delta convergence criterion in the CollectionExecutor

commit 9f41af544eecae0c37ae9470f4ff26f19b5dbdc0
Author: vasia <va...@apache.org>
Date:   2016-10-06T21:00:38Z

[FLINK-3888] add ITCases for delta custom convergence




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Assigned] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-06 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri reassigned FLINK-3888:


Assignee: Vasia Kalavri

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink issue #2564: [FLINK-2254] Add BipartiateGraph class

2016-10-06 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2564
  
Thanks for the update @mushketyk and for the review @greghogan. I agree 
with your suggestions. For the type parameters I would go for `<KT, KB, VVT, 
VVB, EV>`. Let me know if there's any other issue you'd like my opinion on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2587: [FLINK-4729] [gelly] Use optional VertexCentric CombineFu...

2016-10-04 Thread vasia

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2587
  
Great cleanup! Thanks @greghogan. +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-03 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543256#comment-15543256
 ] 

Vasia Kalavri commented on FLINK-3888:
--

We shouldn't override the default convergence criterion of the delta iteration. 
When the workset is empty there's no work to do. Instead, if a custom criterion 
is provided, the convergence condition should be the disjunction of the two.

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1146 matches

Mail list logo