from:"Vasia Kalavri \(JIRA\)"

[jira] [Commented] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-12-01 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274340#comment-16274340
 ] 

Vasia Kalavri commented on FLINK-5506:
--

I only had a quick look at the code; I will need to re-read the paper to make 
sure the algorithm semantics are correct with the following:
I believe the problem is line 147 in {{CommunityDetection.java}}. The code 
assumes we have received only positive scores, while negative ones are indeed 
possible. Changing this line to {{double maxScore = -Double.MAX_VALUE;}} should 
fix it.

> Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
> -
>
> Key: FLINK-5506
> URL: https://issues.apache.org/jira/browse/FLINK-5506
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.4, 1.3.2, 1.4.1
>Reporter: Miguel E. Coimbra
>  Labels: easyfix, newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Reporting this here as per Vasia's advice.
> I am having the following problem while trying out the 
> org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
> (Java).
> Specs: JDK 1.8.0_102 x64
> Apache Flink: 1.1.4
> Suppose I have a very small (I tried an example with 38 vertices as well) 
> dataset stored in a tab-separated file 3-vertex.tsv:
> {code}
> #id1 id2 score
> 010
> 020
> 030
> {code}
> This is just a central vertex with 3 neighbors (disconnected between 
> themselves).
> I am loading the dataset and executing the algorithm with the following code:
> {code}
> // Load the data from the .tsv file.
> final DataSet> edgeTuples = 
> env.readCsvFile(inputPath)
> .fieldDelimiter("\t") // node IDs are separated by spaces
> .ignoreComments("#")  // comments start with "%"
> .types(Long.class, Long.class, Double.class);
> // Generate a graph and add reverse edges (undirected).
> final Graph graph = Graph.fromTupleDataSet(edgeTuples, 
> new MapFunction() {
> private static final long serialVersionUID = 8713516577419451509L;
> public Long map(Long value) {
> return value;
> }
> },
> env).getUndirected();
> // CommunityDetection parameters.
> final double hopAttenuationDelta = 0.5d;
> final int iterationCount = 10;
> // Prepare and trigger the execution.
> DataSet> vs = graph.run(new 
> org.apache.flink.graph.library.CommunityDetection(iterationCount, 
> hopAttenuationDelta)).getVertices();
> vs.print();
> {code}
> Running this code throws the following exception (check the bold line):
> {code}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
> at 
> org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
> at 
> org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
> at 
> org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146)
> at 
> org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
>

[jira] [Commented] (FLINK-2910) Reorganize / Combine Gelly tests

2017-03-06 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896927#comment-15896927
 ] 

Vasia Kalavri commented on FLINK-2910:
--

I think that's a good idea [~greghogan].

> Reorganize / Combine Gelly tests
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-4949) Refactor Gelly driver inputs

2017-03-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696
 ] 

Vasia Kalavri edited comment on FLINK-4949 at 3/2/17 6:05 PM:
--

Thank you [~greghogan]. I can review during the weekend.


was (Author: vkalavri):
Thanks you [~greghogan]. I can review during the weekend.

> Refactor Gelly driver inputs
> 
>
> Key: FLINK-4949
> URL: https://issues.apache.org/jira/browse/FLINK-4949
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.3.0
>
>
> The Gelly drivers started as simple wrappers around library algorithms but 
> have grown to handle a matrix of input sources while often running multiple 
> algorithms and analytics with custom parameterization.
> This ticket will refactor the sourcing of the input graph into separate 
> classes for CSV files and RMat which will simplify the inclusion of new data 
> sources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-4949) Refactor Gelly driver inputs

2017-03-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892696#comment-15892696
 ] 

Vasia Kalavri commented on FLINK-4949:
--

Thanks you [~greghogan]. I can review during the weekend.

> Refactor Gelly driver inputs
> 
>
> Key: FLINK-4949
> URL: https://issues.apache.org/jira/browse/FLINK-4949
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.3.0
>
>
> The Gelly drivers started as simple wrappers around library algorithms but 
> have grown to handle a matrix of input sources while often running multiple 
> algorithms and analytics with custom parameterization.
> This ticket will refactor the sourcing of the input graph into separate 
> classes for CSV files and RMat which will simplify the inclusion of new data 
> sources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-2910) Reorganize / Combine Gelly tests

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Summary: Reorganize / Combine Gelly tests  (was: Combine tests for binary 
graph operators)

> Reorganize / Combine Gelly tests
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Description: 
- Some tests are spread out in different classes could be combined as well, 
e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
for neighborhood methods, etc.
- Testing a binary operator (i.e. union and difference) is done in two similar 
tests: one is testing the expected vertex set and one the expected edge set. 
This can be combined in one test per operator using 
{{LocalCollectionOutputFormat<>}}

  was:Atm, testing a binary operator (i.e. union and difference) is done in two 
similar tests: one is testing the expected vertex set and one the expected edge 
set. This can be combined in one test per operator using 
{{LocalCollectionOutputFormat<>}}


> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> - Some tests are spread out in different classes could be combined as well, 
> e.g. we have 3 different classes for testing graph creation, 2 for degrees, 4 
> for neighborhood methods, etc.
> - Testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2910:
-
Fix Version/s: 1.3.0

> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.3.0
>
>
> Atm, testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889618#comment-15889618
 ] 

Vasia Kalavri commented on FLINK-2910:
--

Thanks for the heads-up [~uce]. Yes, this is still relevant. I will update it.
[~mju] are you still planning to work on this?

> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Atm, testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2017-02-09 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5127:
-
Affects Version/s: 1.1.0
   1.2.0

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.3.0
>
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs  tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2017-02-09 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5127:
-
Fix Version/s: 1.3.0

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.3.0
>
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs  tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example

2017-01-30 Thread Vasia Kalavri (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vasia Kalavri commented on  FLINK-1526 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Add Minimum Spanning Tree library method and example  
 
 
 
 
 
 
 
 
 
 
Hi Xingcan Cui, the problem is that currently you cannot have an iteration (e.g. vertex-centric) inside a for-loop or a while-loop. So, your pseudocode won't work (well, it will, but only for very small inputs). I believe "no value updates" refers to no vertex values changing. Where did you see this? 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] (FLINK-1526) Add Minimum Spanning Tree library method and example

2017-01-29 Thread Vasia Kalavri (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vasia Kalavri commented on  FLINK-1526 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Add Minimum Spanning Tree library method and example  
 
 
 
 
 
 
 
 
 
 
Hi Xingcan Cui, thank you for your interest in this issue. As you can see in the comments history, contributors have had problems completing this task without support for for-loop iterations. Are you planning to take a different approach? Could you describe how you're planning to proceed? Thanks! 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] [Created] (FLINK-5597) Improve the LocalClusteringCoefficient documentation

2017-01-20 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5597:


 Summary: Improve the LocalClusteringCoefficient documentation
 Key: FLINK-5597
 URL: https://issues.apache.org/jira/browse/FLINK-5597
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Reporter: Vasia Kalavri


The LocalClusteringCoefficient usage section should explain what is the 
algorithm output and how to retrieve the actual local clustering coefficient 
scores from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5434) Remove unsupported project() transformation from Scala DataStream docs

2017-01-10 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5434:


 Summary: Remove unsupported project() transformation from Scala 
DataStream docs
 Key: FLINK-5434
 URL: https://issues.apache.org/jira/browse/FLINK-5434
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Vasia Kalavri


The Scala DataStream does not have a project() transformation, yet the docs 
include it as a supported operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5351) Make the TypeExtractor support functions with more than 2 inputs

2016-12-16 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5351:


 Summary: Make the TypeExtractor support functions with more than 2 
inputs
 Key: FLINK-5351
 URL: https://issues.apache.org/jira/browse/FLINK-5351
 Project: Flink
  Issue Type: Improvement
  Components: Gelly, Type Serialization System
Reporter: Vasia Kalavri


Currently, the The TypeExtractor doesn't support functions with more than 2 
inputs. We found that adding such support would be a useful feature for Gelly 
in FLINK-5097.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-5097.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> The TypeExtractor is missing input type information in some Graph methods
> -
>
> Key: FLINK-5097
> URL: https://issues.apache.org/jira/browse/FLINK-5097
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> The TypeExtractor is called without information about the input type in 
> {{mapVertices}} and {{mapEdges}} although this information can be easily 
> retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-5311) Write user documentation for BipartiteGraph

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5311:
-
Issue Type: Improvement  (was: Bug)

> Write user documentation for BipartiteGraph
> ---
>
> Key: FLINK-5311
> URL: https://issues.apache.org/jira/browse/FLINK-5311
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
> Fix For: 1.2.0
>
>
> We need to add user documentation. The progress on BipartiteGraph can be 
> tracked in the following JIRA:
> https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-5311) Write user documentation for BipartiteGraph

2016-12-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-5311.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> Write user documentation for BipartiteGraph
> ---
>
> Key: FLINK-5311
> URL: https://issues.apache.org/jira/browse/FLINK-5311
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
> Fix For: 1.2.0
>
>
> We need to add user documentation. The progress on BipartiteGraph can be 
> tracked in the following JIRA:
> https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2016-12-15 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751890#comment-15751890
 ] 

Vasia Kalavri commented on FLINK-5127:
--

It'd be nice to have for 1.2, but I don't know when I'll have time to work on 
it. I'm hoping this weekend.

> Reduce the amount of intermediate data in vertex-centric iterations
> ---
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> The vertex-centric plan contains a join between the workset (messages) and 
> the solution set (vertices) that outputs  tuples. This 
> intermediate dataset is then co-grouped with the edges to provide the Pregel 
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate 
> dataset. In particular, the vertex state does not have to be attached to all 
> the output tuples of the join. If we replace the join with a coGroup and use 
> an `Either` type, we can attach the vertex state to the first tuple only. The 
> subsequent coGroup can retrieve the vertex state from the first tuple and 
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate 
> data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-12 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744495#comment-15744495
 ] 

Vasia Kalavri commented on FLINK-5245:
--

My point is not that these features are useless for bipartite graphs, but we 
have to think whether re-implementing these features specifically for bipartite 
graphs makes sense, e.g. because general graphs do not supported them or 
because we can use the knowledge that we have a bipartite graph to make 
implementation more efficient. For example, projection is a transformation that 
can only be applied on bipartite graphs. But if all you want to do is get the 
degrees of your bipartite graph, can you use the available Graph methods? Or 
can we provide a better way to get the degrees because we know we have a 
bipartite graph? These are the questions we have to ask for each of these 
features in the list in my opinion.

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-12 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742744#comment-15742744
 ] 

Vasia Kalavri commented on FLINK-5245:
--

I don't think so. We used to have some simple examples that showcased how to 
create a graph in this way, but I don't think we really need such methods for 
the bipartite graph. That said, we should probably go through all the bipartite 
features and decide whether they are useful, e.g. validator and generators. Do 
they even make sense for bipartite graphs? Or when do they?

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-12-08 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733284#comment-15733284
 ] 

Vasia Kalavri commented on FLINK-1536:
--

The idea is what [~greghogan] describes. In a distributed graph processing 
system, you first have to partition the graph before you perform any 
computation. The performance of graph algorithms greatly depends on the 
resulting partitioning. A bad partitioning might assign disproportionally more 
vertices to one partition thus hurting load balancing or it might partition the 
graph so that the communication required is too high (or both). Currently, we 
only support hash partitioning; that is, vertices are randomly assigned to 
workers using the hash of their id. This strategy has very low overhead and 
results in good load balancing unless the graphs are skewed. For more details 
on this problem, I suggest you read some of the papers in the literature linked 
in the description of the issue [~ivan.mushketyk].

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA

2016-11-25 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-5161.

Resolution: Not A Problem

> accepting NullValue for VV in Gelly examples and GSA
> 
>
> Key: FLINK-5161
> URL: https://issues.apache.org/jira/browse/FLINK-5161
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
>
> I made this topic a few days ago about EV, i meant VV back then, i don't know 
> why i suddenly thought about EV and it confused myself. 
> In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double 
> is required but never used, wouldn't it be better to change this into a 
> NullValue? I create a lot of data without Vertex Values and it seems to me 
> that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java
> [2] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5161) accepting NullValue for VV in Gelly examples and GSA

2016-11-25 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695728#comment-15695728
 ] 

Vasia Kalavri commented on FLINK-5161:
--

Hi Wouter,
the vertex value caries the distance from every vertex to the source. Since 
this is a weighted SSSP, this is of double type.
Gelly examples favor simplicity and demonstrate functionality. As a user, you 
should use the library algorithms. And in the library algorithm that you link 
to, the vertex value is actually parametrized (see the last commit), so you can 
use any type you like.



> accepting NullValue for VV in Gelly examples and GSA
> 
>
> Key: FLINK-5161
> URL: https://issues.apache.org/jira/browse/FLINK-5161
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
>
> I made this topic a few days ago about EV, i meant VV back then, i don't know 
> why i suddenly thought about EV and it confused myself. 
> In this gelly example [1] and this GSA algorithm [2] a Vertex Value of Double 
> is required but never used, wouldn't it be better to change this into a 
> NullValue? I create a lot of data without Vertex Values and it seems to me 
> that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java
> [2] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/GSASingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-5152) accepting NullValue for EV in Gelly examples

2016-11-24 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-5152.

Resolution: Not A Problem

> accepting NullValue for EV in Gelly examples
> 
>
> Key: FLINK-5152
> URL: https://issues.apache.org/jira/browse/FLINK-5152
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
> Fix For: 1.1.3
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In this gelly example [1] an EdgeValue of Double is required but never used, 
> wouldn't it be better to change this into a NullValue? I create a lot of data 
> without Edge Values and it seems to me that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5152) accepting NullValue for EV in Gelly examples

2016-11-24 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692732#comment-15692732
 ] 

Vasia Kalavri commented on FLINK-5152:
--

Hi [~otherwise777],
this is an example of _weighted_ shortest paths. The edge value is added to the 
message in the scatter function, thus it cannot be NullValue. If you need a 
shortest paths implementation that ignores edge values, it should be easy to 
modify this example to do that.

> accepting NullValue for EV in Gelly examples
> 
>
> Key: FLINK-5152
> URL: https://issues.apache.org/jira/browse/FLINK-5152
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.3
>Reporter: wouter ligtenberg
> Fix For: 1.1.3
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In this gelly example [1] an EdgeValue of Double is required but never used, 
> wouldn't it be better to change this into a NullValue? I create a lot of data 
> without Edge Values and it seems to me that it's more efficient
> I'd like to hear your thoughts on this
> [1] 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/SingleSourceShortestPaths.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5127) Reduce the amount of intermediate data in vertex-centric iterations

2016-11-22 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5127:


 Summary: Reduce the amount of intermediate data in vertex-centric 
iterations
 Key: FLINK-5127
 URL: https://issues.apache.org/jira/browse/FLINK-5127
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri


The vertex-centric plan contains a join between the workset (messages) and the 
solution set (vertices) that outputs  tuples. This 
intermediate dataset is then co-grouped with the edges to provide the Pregel 
interface directly.

This issue proposes an improvement to reduce the size of this intermediate 
dataset. In particular, the vertex state does not have to be attached to all 
the output tuples of the join. If we replace the join with a coGroup and use an 
`Either` type, we can attach the vertex state to the first tuple only. The 
subsequent coGroup can retrieve the vertex state from the first tuple and 
correctly expose the Pregel interface.

In my preliminary experiments, I find that this change reduces intermediate 
data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-11-22 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686870#comment-15686870
 ] 

Vasia Kalavri commented on FLINK-1536:
--

This issue does not refer to bipartite graphs, even though we could extend it. 
It was initially created as a Google Summer of Code project but it was 
abandoned. That means that you will have to do some background research for it 
and we will definitely need a design document or FLIP for it.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-11-21 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683366#comment-15683366
 ] 

Vasia Kalavri commented on FLINK-1536:
--

Hi [~ivan.mushketyk]
afaik, nobody is currently working on this.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-11-21 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683354#comment-15683354
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hey [~ivan.mushketyk],
I would start with the easy ones, i.e. counts and degrees. I would consider the 
clustering coefficient as a separate case, possibly as a library algorithm.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-11-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-5097:
-
Description: The TypeExtractor is called without information about the 
input type in {{mapVertices}} and {{mapEdges}} although this information can be 
easily retrieved.  (was: The TypeExtractor is called without information about 
the input type in {{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although 
this information can be easily retrieved.)

> The TypeExtractor is missing input type information in some Graph methods
> -
>
> Key: FLINK-5097
> URL: https://issues.apache.org/jira/browse/FLINK-5097
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> The TypeExtractor is called without information about the input type in 
> {{mapVertices}} and {{mapEdges}} although this information can be easily 
> retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-5097) The TypeExtractor is missing input type information in some Graph methods

2016-11-18 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-5097:


 Summary: The TypeExtractor is missing input type information in 
some Graph methods
 Key: FLINK-5097
 URL: https://issues.apache.org/jira/browse/FLINK-5097
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri


The TypeExtractor is called without information about the input type in 
{{mapVertices}}, {{mapVEdges}}, and {{fromDataSet}}, although this information 
can be easily retrieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3551) Sync Scala and Java Streaming Examples

2016-11-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3551:
-
Assignee: Lim Chee Hau

> Sync Scala and Java Streaming Examples
> --
>
> Key: FLINK-3551
> URL: https://issues.apache.org/jira/browse/FLINK-3551
> Project: Flink
>  Issue Type: Sub-task
>  Components: Examples
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Lim Chee Hau
> Fix For: 1.0.1
>
>
> The Scala Examples lack behind the Java Examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3888.
--
Resolution: Fixed

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3888:
-
Fix Version/s: 1.2.0

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
> Fix For: 1.2.0
>
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-4129:
-
Issue Type: Improvement  (was: Bug)

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-4129.
--
Resolution: Fixed

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-4129) Remove the example HITSAlgorithm

2016-10-21 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-4129:
-
Summary: Remove the example HITSAlgorithm  (was: HITSAlgorithm should test 
for element-wise convergence)

> Remove the example HITSAlgorithm
> 
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.2.0
>
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4129) HITSAlgorithm should test for element-wise convergence

2016-10-16 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579933#comment-15579933
 ] 

Vasia Kalavri commented on FLINK-4129:
--

I think having two HITS examples could be confusing to users. Is this 
implementation showcasing some feature that no other example is or could we 
simply remove it in favor of the HITS driver?

> HITSAlgorithm should test for element-wise convergence
> --
>
> Key: FLINK-4129
> URL: https://issues.apache.org/jira/browse/FLINK-4129
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Priority: Minor
>
> {{HITSAlgorithm}} tests for convergence by summing the difference of each 
> authority score minus the average score. This is simply comparing the sum of 
> scores against the previous sum of scores which is not a good test for 
> convergence.
> {code}
> // count the diff value of sum of authority scores
> diffSumAggregator.aggregate(previousAuthAverage - 
> newAuthorityValue.getValue());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1091) Allow joins with the solution set using key selectors

2016-10-16 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579877#comment-15579877
 ] 

Vasia Kalavri commented on FLINK-1091:
--

Hi [~neelesh77],
I'm not working on this. I've unassigned the issue. Do you have a use-case 
where you need this?

> Allow joins with the solution set using key selectors
> -
>
> Key: FLINK-1091
> URL: https://issues.apache.org/jira/browse/FLINK-1091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Iterations
>Reporter: Vasia Kalavri
>Priority: Minor
>  Labels: easyfix, features
>
> Currently, the solution set may only be joined with using tuple field 
> positions.
> A possible solution can be providing explicit functions "joinWithSolution" 
> and "coGroupWithSolution" to make sure the keys used are valid. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1091) Allow joins with the solution set using key selectors

2016-10-16 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-1091:
-
Assignee: (was: Vasia Kalavri)

> Allow joins with the solution set using key selectors
> -
>
> Key: FLINK-1091
> URL: https://issues.apache.org/jira/browse/FLINK-1091
> Project: Flink
>  Issue Type: Sub-task
>  Components: Iterations
>Reporter: Vasia Kalavri
>Priority: Minor
>  Labels: easyfix, features
>
> Currently, the solution set may only be joined with using tuple field 
> positions.
> A possible solution can be providing explicit functions "joinWithSolution" 
> and "coGroupWithSolution" to make sure the keys used are valid. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-06 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri reassigned FLINK-3888:


Assignee: Vasia Kalavri

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>Assignee: Vasia Kalavri
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-03 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543256#comment-15543256
 ] 

Vasia Kalavri commented on FLINK-3888:
--

We shouldn't override the default convergence criterion of the delta iteration. 
When the workset is empty there's no work to do. Instead, if a custom criterion 
is provided, the convergence condition should be the disjunction of the two.

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-03 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3888:
-
Component/s: Iterations

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3888) Custom Aggregator with Convergence can't be registered directly with DeltaIteration

2016-10-03 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543133#comment-15543133
 ] 

Vasia Kalavri commented on FLINK-3888:
--

I had a quick look into this and I don't see any fundamental reason why we 
can't add a custom convergence criterion in delta iterations. It seems that it 
is currently allowed to add one convergence criterion only, and since delta 
iterations have the {{WorksetEmptyConvergenceCriterion}} by default, adding a 
custom one is not possible.

So, one solution could be the following:
1. use {{TaskConfig.setConvergenceCriterion()}} to set the custom, user-defined 
convergence criterion (like in the case of bulk iteration)
2. add a new method {{TaskConfig.setDefaultConvergeCriterion()}} to add the 
default empty workset convergence
3. check both criteria in 
{{IterationSynchronizationSinkTask.checkForConvergence()}}
4. expose the custom convergence criterion in {{DeltaIteration}}

If I'm not missing something and this seems acceptable I'd like to resolve this 
issue. Custom convergence would be helpful in several Gelly algorithms.

> Custom Aggregator with Convergence can't be registered directly with 
> DeltaIteration
> ---
>
> Key: FLINK-3888
> URL: https://issues.apache.org/jira/browse/FLINK-3888
> Project: Flink
>  Issue Type: Bug
>  Components: Iterations
>Reporter: Martin Liesenberg
>
> Contrary to the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
>  the method to add an aggregator with a custom convergence criterion to a 
> DeltaIteration is not exposed directly to DeltaIteration, but can only be 
> accessed via the {{aggregatorRegistry}}.
> Moreover, when registering an aggregator with a custom convergence criterion  
> and running the program, the following exception appears in the logs:
> {noformat}
> Error: Cannot use custom convergence criterion with workset iteration. 
> Workset iterations have implicit convergence criterion where workset is empty.
> org.apache.flink.optimizer.CompilerException: Error: Cannot use custom 
> convergence criterion with workset iteration. Workset iterations have 
> implicit convergence criterion where workset is empty.
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.finalizeWorksetIteration(JobGraphGenerator.java:1518)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:198)
>   at 
> org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:164)
>   at 
> org.apache.flink.test.util.TestEnvironment.execute(TestEnvironment.java:76)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> {noformat}
> The issue has been found while discussing FLINK-2926



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-09-30 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536120#comment-15536120
 ] 

Vasia Kalavri commented on FLINK-1815:
--

The motivation for this issue was to (1) provide an easy way to read graph data 
that is stored in adjacency list format and (2) provide a way to store a graph 
as an adjacency list instead of an edge list, since adjacency list format is 
more compact. Why do you think we're creating our own format? Adjacency list 
format is a CSV file format, i.e. each line contains a vertex following by the 
list of neighbors.
Regarding the 2 last points, I guess you're referring to the PR implementation. 
I haven't reviewed that, but if there is parsing functionality that we can use, 
we should.

> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-09-30 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535748#comment-15535748
 ] 

Vasia Kalavri commented on FLINK-1815:
--

Hi [~fobeligi],
representing a graph as an adjacency list is a separate and larger issue, which 
should be handled independently from this one in my view. If I'm not mistaken, 
in PR #2178 you have already implemented the functionality described in this 
issue. If there are no outstanding comments in the PR or issues to address, I 
think we should merge it.


> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4646) Add BipartiteGraph class

2016-09-22 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512880#comment-15512880
 ] 

Vasia Kalavri commented on FLINK-4646:
--

Hi [~StephanEwen], thanks! We've discussed this in the parent issue, yes.

> Add BipartiteGraph class
> 
>
> Key: FLINK-4646
> URL: https://issues.apache.org/jira/browse/FLINK-4646
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly
>Reporter: Ivan Mushketyk
>
> Implement a class to represent a bipartite graph in Flink Gelly. Design 
> discussions can be found in the parent task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-09-15 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493427#comment-15493427
 ] 

Vasia Kalavri commented on FLINK-2254:
--

It sounds good to me [~ivan.mushketyk]. Please go ahead and break this into 
smaller tasks. We can then prioritize. Thanks!

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-09-11 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482143#comment-15482143
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hi [~ivan.mushketyk]
thanks a lot for creating the design document!
I agree with [~greghogan] that separate classes might work better than a common 
hierarchy. Actually, I don't really see the intuition behind adding the 
{{BaseGraph}} interface. What methods would you add there that would be common 
to both bipartite and other graphs?
As for the vertex Ids, in the general case those might be different. Think e.g. 
movie-actor graphs or user-tweet graphs. If it simplifies the design, we could 
initially have a first implementation with the same id type for top and bottom 
vertices and use a label to distinguish them instead. Let me know what you both 
think.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-09-11 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481915#comment-15481915
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Hi [~joseprupi],
I see that in order to create the graph you are looping over the similarity 
matrix and creating lists for I-type and E-type vertices. This implementation 
assumes that your matrix and the vertex lists fit in the memory of a single 
machine. Could you instead read the similarity matrix into vertex and edge 
DataSets directly?

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing
> Example spreadsheet:
> https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4522) Gelly link broken in homepage

2016-08-29 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445732#comment-15445732
 ] 

Vasia Kalavri commented on FLINK-4522:
--

Hi Greg,
what is the current redirect? The gelly link on flink.apache.org currently 
gives a 404.

> Gelly link broken in homepage
> -
>
> Key: FLINK-4522
> URL: https://issues.apache.org/jira/browse/FLINK-4522
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Gelly
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Vasia Kalavri
>
> The link to the Gelly documentation is broken in the Flink homepage. The link 
> points to "docs/apis/batch/libs/gelly.md" which has been removed. Since this 
> link might be present in other places as well, e.g. slides, trainings, etc., 
> we should re-direct to the new location of the Gelly docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-08-28 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444180#comment-15444180
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hi [~ivan.mushketyk],

what you are describing in "we may consider that two actors are connected if 
they have a movie in common" is called a projection. That is, you take one of 
the modes of a bipartite graph and project it into a new graph. It would be 
nice to support projections for bipartite graphs, but maybe that's something we 
can do after we have basic support for bipartite graphs.

By basic support, I mean that we first need to provide a way to represent and 
define bipartite graphs, e.g. each vertex mode can be a separate dataset with 
different id and value types. I would start by thinking how it is best to 
represent bipartite graphs and sketch the API for creating them. Then, we 
should go over the existing graph methods and algorithms and see if/how they 
need to be adopted for bipartite graphs, e.g. what should {{getVertices()}} 
return?

Regarding the support of efficient operations, the idea is to exploit the 
knowledge there exist no edges between vertices of the same mode. This might 
let us perform some operations more efficiently, by e.g. filtering out edges / 
vertices, using certain join strategies, etc.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-08-28 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15443318#comment-15443318
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Hi [~joseprupi],

thanks for the update. Regarding the initialization, I'm sorry that nobody 
replied to your email. Just a thought: couldn't you simply read the similarity 
matrix line-by-line and create the graph edges with their weights from it?

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing
> Example spreadsheet:
> https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4522) Gelly link broken in homepage

2016-08-28 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-4522:


 Summary: Gelly link broken in homepage
 Key: FLINK-4522
 URL: https://issues.apache.org/jira/browse/FLINK-4522
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Gelly
Affects Versions: 1.1.0, 1.1.1
Reporter: Vasia Kalavri


The link to the Gelly documentation is broken in the Flink homepage. The link 
points to "docs/apis/batch/libs/gelly.md" which has been removed. Since this 
link might be present in other places as well, e.g. slides, trainings, etc., we 
should re-direct to the new location of the Gelly docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-26 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439178#comment-15439178
 ] 

Vasia Kalavri commented on FLINK-4440:
--

Alright, makes sense. Let's close this then. Is that fine with you 
[~ivan.mushketyk] or is there anything we're overlooking here?

> Make API for edge/vertex creation less verbose
> --
>
> Key: FLINK-4440
> URL: https://issues.apache.org/jira/browse/FLINK-4440
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>Priority: Trivial
>
> It would be better if one could create vertex/edges like this:
> {code:java}
> Vertex v = Vertex.create(42);
> Edge e = Edge.create(5, 6);
> {code}
> Instead of this:
> {code:java}
> Vertex v = new Vertex(42, 
> NullValue.getInstance());
> Edge e = new Edge NullValue>(5, 6, NullValue.getInstance());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-25 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437307#comment-15437307
 ] 

Vasia Kalavri commented on FLINK-4440:
--

I don't have a strong opinion against adding the {{create}} methods. Why do you 
think there is a cost to adding them [~greghogan]?

> Make API for edge/vertex creation less verbose
> --
>
> Key: FLINK-4440
> URL: https://issues.apache.org/jira/browse/FLINK-4440
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>Priority: Trivial
>
> It would be better if one could create vertex/edges like this:
> {code:java}
> Vertex v = Vertex.create(42);
> Edge e = Edge.create(5, 6);
> {code}
> Instead of this:
> {code:java}
> Vertex v = new Vertex(42, 
> NullValue.getInstance());
> Edge e = new Edge NullValue>(5, 6, NullValue.getInstance());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-08-19 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2254:
-
Comment: was deleted

(was: Hi [~ivan.mushketyk],
thank you for your interest. As far as I know, nobody is currently working on 
this. If you want to take over, let me know and I'll assign it to you. It would 
be great if you could revise the design doc or feel free to create your own 
from scratch.)

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-08-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428076#comment-15428076
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hi [~ivan.mushketyk],
thank you for your interest. As far as I know, nobody is currently working on 
this. If you want to take over, let me know and I'll assign it to you. It would 
be great if you could revise the design doc or feel free to create your own 
from scratch.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-08-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428075#comment-15428075
 ] 

Vasia Kalavri commented on FLINK-2254:
--

Hi [~ivan.mushketyk],
thank you for your interest. As far as I know, nobody is currently working on 
this. If you want to take over, let me know and I'll assign it to you. It would 
be great if you could revise the design doc or feel free to create your own 
from scratch.

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-08-15 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421162#comment-15421162
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Hi [~joseprupi],
I haven't had time to look at your example yet :/
Regarding your first question, if you skip damping, is it guaranteed that the 
algorithm still converges and produces the correct value? It would just 
converge more slowly?
Regarding your other questions, I think that's it's more natural to have the 
graph itself as input with the similarities as edge values. This way (1) you 
won't need to convert any matrix to a graph and (2) nodes have non-zero 
similarities with their neighbors only. Does this make sense?

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing
> Example spreadsheet:
> https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-08-08 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412194#comment-15412194
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Thank you for the example [~joseprupi]. I'll try to get back to you soon!

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing
> Example spreadsheet:
> https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4204) Clean up gelly-examples

2016-07-14 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376834#comment-15376834
 ] 

Vasia Kalavri commented on FLINK-4204:
--

Separating drivers and examples sounds like a good idea. Do you think we should 
add drivers for every library algorithm? Isn't it enough to provide 1-2 
examples and have good documentation about how users can write their own 
drivers?

> Clean up gelly-examples
> ---
>
> Key: FLINK-4204
> URL: https://issues.apache.org/jira/browse/FLINK-4204
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>
> The gelly-examples has grown quite big (14 examples) and contains several 
> examples that illustrate the same functionality. Examples should help users 
> understand how to use the API and ideally show how to use 1-2 features.
> Also, it is helpful to state the purpose of each example in the comments.
> We should keep the example set small and move everything that does not fit 
> there to the library.
> I propose to remove the following:
> - ClusteringCoefficient: the functionality already exists as a library method.
> - HITS: the functionality already exists as a library method.
> - JaccardIndex: the functionality already exists as a library method.
> - SingleSourceShortestPaths: the example shows how to use scatter-gather 
> iterations. HITSAlgorithm shows the same feature plus the use of aggregators. 
> I propose we keep this one instead.
> - TriangleListing: the functionality already exists as a library method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4204) Clean up gelly-examples

2016-07-12 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373767#comment-15373767
 ] 

Vasia Kalavri commented on FLINK-4204:
--

[~greghogan] let me know what you think!

> Clean up gelly-examples
> ---
>
> Key: FLINK-4204
> URL: https://issues.apache.org/jira/browse/FLINK-4204
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>
> The gelly-examples has grown quite big (14 examples) and contains several 
> examples that illustrate the same functionality. Examples should help users 
> understand how to use the API and ideally show how to use 1-2 features.
> Also, it is helpful to state the purpose of each example in the comments.
> We should keep the example set small and move everything that does not fit 
> there to the library.
> I propose to remove the following:
> - ClusteringCoefficient: the functionality already exists as a library method.
> - HITS: the functionality already exists as a library method.
> - JaccardIndex: the functionality already exists as a library method.
> - SingleSourceShortestPaths: the example shows how to use scatter-gather 
> iterations. HITSAlgorithm shows the same feature plus the use of aggregators. 
> I propose we keep this one instead.
> - TriangleListing: the functionality already exists as a library method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4204) Clean up gelly-examples

2016-07-12 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-4204:


 Summary: Clean up gelly-examples
 Key: FLINK-4204
 URL: https://issues.apache.org/jira/browse/FLINK-4204
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


The gelly-examples has grown quite big (14 examples) and contains several 
examples that illustrate the same functionality. Examples should help users 
understand how to use the API and ideally show how to use 1-2 features.
Also, it is helpful to state the purpose of each example in the comments.
We should keep the example set small and move everything that does not fit 
there to the library.
I propose to remove the following:
- ClusteringCoefficient: the functionality already exists as a library method.
- HITS: the functionality already exists as a library method.
- JaccardIndex: the functionality already exists as a library method.
- SingleSourceShortestPaths: the example shows how to use scatter-gather 
iterations. HITSAlgorithm shows the same feature plus the use of aggregators. I 
propose we keep this one instead.
- TriangleListing: the functionality already exists as a library method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2352) [Graph Visualization] Integrate Gelly with Gephi

2016-06-30 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-2352.

Resolution: Won't Fix

I agree with Greg. Gephi doesn't seem like a good fit.

> [Graph Visualization] Integrate Gelly with Gephi
> 
>
> Key: FLINK-2352
> URL: https://issues.apache.org/jira/browse/FLINK-2352
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>
> This integration will allow users to see the real-time progress of their 
> graph. They could also visually verify results for clustering algorithms, for 
> example. Gephi is free/open-source and provides support for all types of 
> networks, including dynamic and hierarchical graphs. 
> A first step would be to add the Gephi Toolkit to the pom.xml.
> https://github.com/gephi/gephi-toolkit
> Afterwards, a GraphBuilder similar to this one
> https://github.com/palmerabollo/test-twitter-graph/blob/master/src/main/java/es/guido/twitter/graph/GraphBuilder.java
> can be implemented. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1759) Execution statistics for vertex-centric iterations

2016-06-30 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357419#comment-15357419
 ] 

Vasia Kalavri commented on FLINK-1759:
--

I think it would be very helpful. A couple of users have asked for this 
recently.

> Execution statistics for vertex-centric iterations
> --
>
> Key: FLINK-1759
> URL: https://issues.apache.org/jira/browse/FLINK-1759
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Priority: Minor
>
> It would be nice to add an option for gathering execution statistics from 
> VertexCentricIteration.
> In particular, the following metrics could be useful:
> - total number of supersteps
> - number of messages sent (total / per superstep)
> - bytes of messages exchanged (total / per superstep)
> - execution time (total / per superstep)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-06-29 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355673#comment-15355673
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Thanks [~joseprupi]. If you could document the example, it should make it clear 
whether we can find a scalable implementation or not.
For the original AP, it should be easy to port the Giraph implementation 
(described in the link of this JIRA description) to Gelly.

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-29 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355664#comment-15355664
 ] 

Vasia Kalavri commented on FLINK-3879:
--

You and I are the only Gelly component "shepherds", but that doesn't mean we 
are the only ones that can review Gelly PRs. Any committer can help :)
I like the idea about improving our process. I went through the JIRAs last week 
and pinged a few people, released some, closed some, but we can certainly clean 
up more.
Big +1 for roadmap => JIRA => implementation discussion => PR.
Do you think we should add this to the wiki /contribution guidelines / start a 
discussion in the dev list?

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3965) Delegating GraphAlgorithm

2016-06-23 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346395#comment-15346395
 ] 

Vasia Kalavri commented on FLINK-3965:
--

The functionality is much needed in my opinion. My concern is what we expose to 
users and what we keep internal to Gelly. If the {{DelegatingGraphAlgorithm}} 
and the {{GraphAnalytic}} are intended for users, then we should make their 
functionalities and differences very clear in the docs, including examples. 
Maybe that can be done as part of FLINK-4104?

> Delegating GraphAlgorithm
> -
>
> Key: FLINK-3965
> URL: https://issues.apache.org/jira/browse/FLINK-3965
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Complex and related algorithms often overlap in computation of data. Two such 
> examples are:
> 1) the local and global clustering coefficients each use a listing of 
> triangles
> 2) the local clustering coefficient joins on vertex degree, and the 
> underlying triangle listing annotates edge degree which uses vertex degree
> We can reuse and rewrite algorithm output by creating a {{ProxyObject}} as a 
> delegate for method calls to the {{DataSet}} returned by the algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4101) Calculating average in Flink DataStream on window time

2016-06-23 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346204#comment-15346204
 ] 

Vasia Kalavri commented on FLINK-4101:
--

Hi [~mrakki3110], JIRA is used for reporting bugs or proposing new features. 
Your question should be posted either in the user mailing list or SO, as you've 
already done. I'm closing this.

> Calculating average in Flink DataStream on window time
> --
>
> Key: FLINK-4101
> URL: https://issues.apache.org/jira/browse/FLINK-4101
> Project: Flink
>  Issue Type: Task
>  Components: DataStream API
>Affects Versions: 1.0.2
>Reporter: Akshay Shingote
>
> I am using Flink DataStream API where there where racks are available & I 
> want to calculate "average"of temperature group by rack IDs. My window 
> duration is of 40 seconds & my window is sliding every 10 seconds...Following 
> is my code where I am calculating sum of temperatures every 10 seconds for 
> every rackID,but now I want to calculate average temperatures::
> static Properties properties=new Properties();
> public static Properties getProperties()
> {
> properties.setProperty("bootstrap.servers", "54.164.200.104:9092");
> properties.setProperty("zookeeper.connect", "54.164.200.104:2181");
> //properties.setProperty("deserializer.class", 
> "kafka.serializer.StringEncoder");
> //properties.setProperty("group.id", "akshay");
> properties.setProperty("auto.offset.reset", "earliest");
> return properties;
> }
>  @SuppressWarnings("rawtypes")
> public static void main(String[] args) throws Exception 
> {
> StreamExecutionEnvironment 
> env=StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
> Properties props=Program.getProperties();
> DataStream dstream=env.addSource(new 
> FlinkKafkaConsumer09("TemperatureEvent", new 
> TemperatureEventSchema(), props)).assignTimestampsAndWatermarks(new 
> IngestionTimeExtractor<>());
> DataStream 
> ds1=dstream.keyBy("rackId").timeWindow(Time.seconds(40), 
> Time.seconds(10)).sum("temperature");
> env.execute("Temperature Consumer");
> }
> How can I calcluate average temperature for the above example ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-4101) Calculating average in Flink DataStream on window time

2016-06-23 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-4101.

Resolution: Invalid

> Calculating average in Flink DataStream on window time
> --
>
> Key: FLINK-4101
> URL: https://issues.apache.org/jira/browse/FLINK-4101
> Project: Flink
>  Issue Type: Task
>  Components: DataStream API
>Affects Versions: 1.0.2
>Reporter: Akshay Shingote
>
> I am using Flink DataStream API where there where racks are available & I 
> want to calculate "average"of temperature group by rack IDs. My window 
> duration is of 40 seconds & my window is sliding every 10 seconds...Following 
> is my code where I am calculating sum of temperatures every 10 seconds for 
> every rackID,but now I want to calculate average temperatures::
> static Properties properties=new Properties();
> public static Properties getProperties()
> {
> properties.setProperty("bootstrap.servers", "54.164.200.104:9092");
> properties.setProperty("zookeeper.connect", "54.164.200.104:2181");
> //properties.setProperty("deserializer.class", 
> "kafka.serializer.StringEncoder");
> //properties.setProperty("group.id", "akshay");
> properties.setProperty("auto.offset.reset", "earliest");
> return properties;
> }
>  @SuppressWarnings("rawtypes")
> public static void main(String[] args) throws Exception 
> {
> StreamExecutionEnvironment 
> env=StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
> Properties props=Program.getProperties();
> DataStream dstream=env.addSource(new 
> FlinkKafkaConsumer09("TemperatureEvent", new 
> TemperatureEventSchema(), props)).assignTimestampsAndWatermarks(new 
> IngestionTimeExtractor<>());
> DataStream 
> ds1=dstream.keyBy("rackId").timeWindow(Time.seconds(40), 
> Time.seconds(10)).sum("temperature");
> env.execute("Temperature Consumer");
> }
> How can I calcluate average temperature for the above example ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FLINK-1514) [Gelly] Add a Gather-Sum-Apply iteration method

2016-06-23 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346160#comment-15346160
 ] 

Vasia Kalavri edited comment on FLINK-1514 at 6/23/16 9:27 AM:
---

Hi [~michael], this GSA implementation is a variation of the [Powergraph 
abstraction | https://www.eecs.harvard.edu/cs261/papers/gonzalez-2012.pdf]. You 
can read more details about how it works in the [Gelly docs | 
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html#gather-sum-apply-iterations].


was (Author: vkalavri):
Hi [~michael], tis GSA implementation is a variation of the [Powergraph 
abstraction | https://www.eecs.harvard.edu/cs261/papers/gonzalez-2012.pdf]. You 
can read more details about how it works in the [Gelly docs | 
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html#gather-sum-apply-iterations].

> [Gelly] Add a Gather-Sum-Apply iteration method
> ---
>
> Key: FLINK-1514
> URL: https://issues.apache.org/jira/browse/FLINK-1514
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Daniel Bali
> Fix For: 0.9
>
>
> This will be a method that implements the GAS computation model, but without 
> the "scatter" step. The phases can be mapped into the following steps inside 
> a delta iteration:
> gather: a map on each < srcVertex, edge, trgVertex > that produces a partial 
> value
> sum: a reduce that combines the partial values
> apply: join with vertex set to update the vertex values using the results of 
> sum and the previous state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1514) [Gelly] Add a Gather-Sum-Apply iteration method

2016-06-23 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346160#comment-15346160
 ] 

Vasia Kalavri commented on FLINK-1514:
--

Hi [~michael], tis GSA implementation is a variation of the [Powergraph 
abstraction | https://www.eecs.harvard.edu/cs261/papers/gonzalez-2012.pdf]. You 
can read more details about how it works in the [Gelly docs | 
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html#gather-sum-apply-iterations].

> [Gelly] Add a Gather-Sum-Apply iteration method
> ---
>
> Key: FLINK-1514
> URL: https://issues.apache.org/jira/browse/FLINK-1514
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Daniel Bali
> Fix For: 0.9
>
>
> This will be a method that implements the GAS computation model, but without 
> the "scatter" step. The phases can be mapped into the following steps inside 
> a delta iteration:
> gather: a map on each < srcVertex, edge, trgVertex > that produces a partial 
> value
> sum: a reduce that combines the partial values
> apply: join with vertex set to update the vertex values using the results of 
> sum and the previous state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1815) Add methods to read and write a Graph as adjacency list

2016-06-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338596#comment-15338596
 ] 

Vasia Kalavri commented on FLINK-1815:
--

Hi [~fobeligi], are you by any chance still working on this or shall I release 
it? Thanks!

> Add methods to read and write a Graph as adjacency list
> ---
>
> Key: FLINK-1815
> URL: https://issues.apache.org/jira/browse/FLINK-1815
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Faye Beligianni
>Priority: Minor
>
> It would be nice to add utility methods to read a graph from an Adjacency 
> list format and also write a graph in such a format.
> The simple case would be to read a graph with no vertex or edge values, where 
> we would need to define (a) a line delimiter, (b) a delimiter to separate 
> vertices from neighbor list and (c) and a delimiter to separate the neighbors.
> For example, "1 2,3,4\n2 1,3" would give vertex 1 with neighbors 2, 3 and 4 
> and vertex 2 with neighbors 1 and 3.
> If we have vertex values and/or edge values, we also need to have a way to 
> separate IDs from values. For example, we could have "1 0.1 2 0.5, 3 0.2" to 
> define a vertex 1 with value 0.1, edge (1, 2) with weight 0.5 and edge (1, 3) 
> with weight 0.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3618) Rename abstract UDF classes in Scatter-Gather implementation

2016-06-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338595#comment-15338595
 ] 

Vasia Kalavri commented on FLINK-3618:
--

[~greghogan] are you working on this? It would be nice to make this change for 
1.1.0. If you don't have time to finish this, let me know and I'll take over. 
Thanks!

> Rename abstract UDF classes in Scatter-Gather implementation
> 
>
> Key: FLINK-3618
> URL: https://issues.apache.org/jira/browse/FLINK-3618
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0, 1.0.1
>Reporter: Martin Junghanns
>Assignee: Greg Hogan
>Priority: Minor
>
> We now offer three Vertex-centric computing abstractions:
> * Pregel
> * Gather-Sum-Apply
> * Scatter-Gather
> Each of these abstractions provides abstract classes that need to be 
> implemented by the user:
> * Pregel: {{ComputeFunction}}
> * GSA: {{GatherFunction}}, {{SumFunction}}, {{ApplyFunction}}
> * Scatter-Gather: {{MessagingFunction}}, {{VertexUpdateFunction}}
> In Pregel and GSA, the names of those functions follow the name of the 
> abstraction or the name suggested in the corresponding papers. For 
> consistency of the API, I propose to rename {{MessageFunction}} to 
> {{ScatterFunction}} and {{VertexUpdateFunction}} to {{GatherFunction}}.
> Also for consistency, I would like to change the parameter order in 
> {{Graph.runScatterGatherIteration(VertexUpdateFunction f1, MessagingFunction 
> f2}} to  {{Graph.runScatterGatherIteration(ScatterFunction f1, GatherFunction 
> f2}} (like in {{Graph.runGatherSumApplyFunction(...)}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3965) Delegating GraphAlgorithm

2016-06-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338593#comment-15338593
 ] 

Vasia Kalavri commented on FLINK-3965:
--

Hey [~greghogan], do we want this in for 1.1.0?

> Delegating GraphAlgorithm
> -
>
> Key: FLINK-3965
> URL: https://issues.apache.org/jira/browse/FLINK-3965
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Complex and related algorithms often overlap in computation of data. Two such 
> examples are:
> 1) the local and global clustering coefficients each use a listing of 
> triangles
> 2) the local clustering coefficient joins on vertex degree, and the 
> underlying triangle listing annotates edge degree which uses vertex degree
> We can reuse and rewrite algorithm output by creating a {{ProxyObject}} as a 
> delegate for method calls to the {{DataSet}} returned by the algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2352) [Graph Visualization] Integrate Gelly with Gephi

2016-06-19 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2352:
-
Assignee: (was: Shivani Ghatge)

> [Graph Visualization] Integrate Gelly with Gephi
> 
>
> Key: FLINK-2352
> URL: https://issues.apache.org/jira/browse/FLINK-2352
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>
> This integration will allow users to see the real-time progress of their 
> graph. They could also visually verify results for clustering algorithms, for 
> example. Gephi is free/open-source and provides support for all types of 
> networks, including dynamic and hierarchical graphs. 
> A first step would be to add the Gephi Toolkit to the pom.xml.
> https://github.com/gephi/gephi-toolkit
> Afterwards, a GraphBuilder similar to this one
> https://github.com/palmerabollo/test-twitter-graph/blob/master/src/main/java/es/guido/twitter/graph/GraphBuilder.java
> can be implemented. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338592#comment-15338592
 ] 

Vasia Kalavri commented on FLINK-3879:
--

Hey [~greghogan], since FLINK-2044 was updated to provide convergence and 
return both scores, do we still need this issue?

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2941) Implement a neo4j - Flink/Gelly connector

2016-06-19 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-2941.

Resolution: Won't Fix

The connector has been implemented by [~mju] in 
https://github.com/s1ck/flink-neo4j.

> Implement a neo4j - Flink/Gelly connector
> -
>
> Key: FLINK-2941
> URL: https://issues.apache.org/jira/browse/FLINK-2941
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Martin Junghanns
>  Labels: requires-design-doc
>
> By connecting Flink/Gelly with a graph database like neo4j we can facilitate 
> interesting use-cases, like:
> - use neo4j as an input source, i.e. read the complete graph or a subgraph 
> from a neo4j database, import it into Flink and run a graph analysis task 
> with Gelly.
> - use neo4j as a sink, i.e. perform ETL on some data in Flink to create a 
> graph and insert the graph in a neo4j database for further querying.
> We have started a discussion on possible implementations and have looked into 
> similar projects, e.g. connecting neo4j to Spark. Some initial thoughts and 
> experiences can be found in [this 
> document|https://docs.google.com/document/d/13qT_e-y8aTNWQnD43jRBq1074Y1LggPNDsic_Obwc28/edit?usp=sharing].
>  Please, feel free to comment and add ideas! I will also start a discussion 
> in the mailing list with more concrete problems that we would like to get 
> feedback on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2941) Implement a neo4j - Flink/Gelly connector

2016-06-19 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338588#comment-15338588
 ] 

Vasia Kalavri commented on FLINK-2941:
--

Hi,
is there any news regarding this? [~mju], are you going to keep this in your 
own github or move it somewhere else? In any case, we should link it in 
https://flink.apache.org/community.html#third-party-packages. I will now close 
this issue.

> Implement a neo4j - Flink/Gelly connector
> -
>
> Key: FLINK-2941
> URL: https://issues.apache.org/jira/browse/FLINK-2941
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Martin Junghanns
>  Labels: requires-design-doc
>
> By connecting Flink/Gelly with a graph database like neo4j we can facilitate 
> interesting use-cases, like:
> - use neo4j as an input source, i.e. read the complete graph or a subgraph 
> from a neo4j database, import it into Flink and run a graph analysis task 
> with Gelly.
> - use neo4j as a sink, i.e. perform ETL on some data in Flink to create a 
> graph and insert the graph in a neo4j database for further querying.
> We have started a discussion on possible implementations and have looked into 
> similar projects, e.g. connecting neo4j to Spark. Some initial thoughts and 
> experiences can be found in [this 
> document|https://docs.google.com/document/d/13qT_e-y8aTNWQnD43jRBq1074Y1LggPNDsic_Obwc28/edit?usp=sharing].
>  Please, feel free to comment and add ideas! I will also start a discussion 
> in the mailing list with more concrete problems that we would like to get 
> feedback on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3916) Allow generic types passing the Table API

2016-06-15 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332354#comment-15332354
 ] 

Vasia Kalavri commented on FLINK-3916:
--

FLINK-3615 is about extending or refactoring {{sqlTypeToTypeInfo}}. I do not 
recall what kind of trouble it had caused at that time, but it looks like we 
were thinking of refactoring the code to determine return types per operator.

> Allow generic types passing the Table API
> -
>
> Key: FLINK-3916
> URL: https://issues.apache.org/jira/browse/FLINK-3916
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The Table API currently only supports BasicTypes that can pass the Table API. 
> Other types should also be supported but treated as black boxes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-05-29 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306054#comment-15306054
 ] 

Vasia Kalavri commented on FLINK-1707:
--

Thank you [~joseprupi]! When you are ready, open a PR and let me know if you 
need any help.

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-05-29 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306053#comment-15306053
 ] 

Vasia Kalavri commented on FLINK-1536:
--

Hi [~rohit13k],
a graph in gelly is represented by 2 datasets; for vertices and edges. Datasets 
are distributed immutable collections of data. That is, additions or removals 
return a new Graph.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-25 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2044.
--
   Resolution: Implemented
Fix Version/s: 1.1.0

> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
> Fix For: 1.1.0
>
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1536) Graph partitioning operators for Gelly

2016-05-17 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286478#comment-15286478
 ] 

Vasia Kalavri commented on FLINK-1536:
--

Hi [~rohit13k],
thank you for your interest. As far as I know nobody is currently working on 
this JIRA. Bear in mind that Gelly is built on top of the Flink DataSet API, 
i.e. it only supports static graphs, so I don't think this JIRA is the right 
place to start if you want to experiment with dynamic graph partitioning.
You might want to take a look at gelly-stream, a WIP graph streaming API that 
[~senorcarbone] and I have been working on. You can find our code 
[here|https://github.com/vasia/gelly-streaming]. We are currently working with 
a student on adding stream partitioning. If you're interested, feel free to 
start a discussion on that repository.

> Graph partitioning operators for Gelly
> --
>
> Key: FLINK-1536
> URL: https://issues.apache.org/jira/browse/FLINK-1536
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Priority: Minor
>
> Smart graph partitioning can significantly improve the performance and 
> scalability of graph analysis applications. Depending on the computation 
> pattern, a graph partitioning algorithm divides the graph into (maybe 
> overlapping) subgraphs, optimizing some objective. For example, if 
> communication is performed across graph edges, one might want to minimize the 
> edges that cross from one partition to another.
> The problem of graph partitioning is a well studied problem and several 
> algorithms have been proposed in the literature. The goal of this project 
> would be to choose a few existing partitioning techniques and implement the 
> corresponding graph partitioning operators for Gelly.
> Some related literature can be found [here| 
> http://www.citeulike.org/user/vasiakalavri/tag/graph-partitioning].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2375) Add Approximate Adamic Adar Similarity method using BloomFilters

2016-05-11 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-2375.

Resolution: Won't Fix

This technique can be considered as part of FLINK-3898.

> Add Approximate Adamic Adar Similarity method using BloomFilters
> 
>
> Key: FLINK-2375
> URL: https://issues.apache.org/jira/browse/FLINK-2375
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Shivani Ghatge
>Assignee: Shivani Ghatge
>Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps:
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value.
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Using BloomFilters we increase the scalability of the algorithm. The values 
> calculated for the edges will be approximate.
> Prerequisites:
> Full understanding of the Jaccard Similarity Measure algorithm
> Reading the associated literature:
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2634) Add a Vertex-centric Version of the Tringle Count Library Method

2016-05-11 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-2634.

Resolution: Won't Fix

There is currently no need or demand for this version. The current 
TriangleEnumerator cover this functionality.

> Add a Vertex-centric Version of the Tringle Count Library Method
> 
>
> Key: FLINK-2634
> URL: https://issues.apache.org/jira/browse/FLINK-2634
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> The vertex-centric version of this algorithm receives an undirected graph as 
> input and outputs the total number of triangles formed by the graph's edges.
> The implementation consists of three phases:
> 1). Select neighbours with id greater than the current vertex id.
> 2). Propagate each received value to neighbours with higher id. 
> 3). Compute the number of Triangles by verifying if the final vertex contains 
> the sender's id in its list.
> As opposed to the GAS version, all these three steps will be performed via 
> message passing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3898) Adamic-Adar Similarity

2016-05-11 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280260#comment-15280260
 ] 

Vasia Kalavri commented on FLINK-3898:
--

I guess we can close FLINK-2310 as a duplicate? I know that nobody is working 
on it.

> Adamic-Adar Similarity
> --
>
> Key: FLINK-3898
> URL: https://issues.apache.org/jira/browse/FLINK-3898
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> The implementation of Adamic-Adar Similarity [0] is very close to Jaccard 
> Similarity. Whereas Jaccard Similarity counts common neighbors, Adamic-Adar 
> Similarity sums the inverse logarithm of the degree of common neighbors.
> Consideration will be given to the computation of the inverse logarithm, in 
> particular whether to pre-compute a small array of values.
> [0] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-11 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280246#comment-15280246
 ] 

Vasia Kalavri commented on FLINK-3879:
--

[~greghogan]
- Do we agree that the PR for FLINK-2044 is now in good state and could be 
merged? Or would you rather benchmark this against it and go for the most 
performant one?
- Gelly library methods: currently there are scatter-gather and GSA 
implementations for PageRank, Connected Components, and SSSP. We have these 
because GSA performs better for graphs with skewed degree distributions. In the 
Gelly docs-[iteration abstractions 
comparison|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html#iteration-abstractions-comparison],
 we describe when GSA should be preferred over scatter-gather. Maybe we can 
make this more explicit.
There is no Pregel implementation (only in examples). The {{GSATriangleCount}} 
library method has proved to be very inefficient and should be removed imo 
(I'll open a JIRA).
- I'm not sure what you mean by "approximate HITS"?

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-11 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279918#comment-15279918
 ] 

Vasia Kalavri commented on FLINK-3879:
--

Gelly has multiple implementations for some algorithms to showcase how the 
different iteration abstractions can be used. Also, for some graph inputs an 
implementation might perform better than another (e.g. scatter-gather vs gsa). 
That doesn't mean we should add multiple implementations for all new algorithms 
:)
Now regarding performance, I'm not quite sure that FLINK-3879 will perform 
better than FLINK-2044. I haven't looked at the PR in detail, but I saw that it 
uses a bulk iteration. That means that a new partial solution is generated in 
every iteration and we cannot take advantage of the asymmetric convergence (if 
any).


> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-09 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276102#comment-15276102
 ] 

Vasia Kalavri commented on FLINK-3879:
--

Hey [~greghogan], [~gallenvara_bg],
what's the plan with this issue and FLINK-2044? We now have 2 JIRAs and 2 
implementations for the same algorithm.
Is the intention to only keep one or both? Do they provide substantially 
different functionality? Thanks!

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-06 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274270#comment-15274270
 ] 

Vasia Kalavri commented on FLINK-3879:
--

So, it provides the same functionality as FLINK-2044, but it's more efficient?

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-05-06 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274254#comment-15274254
 ] 

Vasia Kalavri commented on FLINK-3879:
--

Hi [~greghogan],
Can you please extend the description a bit? How is algorithm different than 
what FLINK-2044 proposes and what's the motivation for adding it to Gelly?
Thanks!

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2926) Add a Strongly Connected Components Library Method

2016-05-06 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273933#comment-15273933
 ] 

Vasia Kalavri commented on FLINK-2926:
--

Hi [~mliesenberg],
delta iteration by default finishes when the workset is empty, but I don't see 
why it couldn't support a custom convergence criterion also. I thought this 
method was there already. In fact the [iterations 
guide|https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html]
 states that the delta iteration supports "Custom aggregator convergence" so 
that's weird. Can you please open an issue for that? Thanks!

> Add a Strongly Connected Components Library Method
> --
>
> Key: FLINK-2926
> URL: https://issues.apache.org/jira/browse/FLINK-2926
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Martin Liesenberg
>Priority: Minor
>  Labels: requires-design-doc
>
> This algorithm operates in four main steps: 
> 1). Form the transposed graph (each vertex sends its id to its out neighbors 
> which form a transposedNeighbors set)
> 2). Trimming: every vertex which has only incoming or outgoing edges sets 
> colorID to its own value and becomes inactive. 
> 3). Forward traversal: 
>Start phase: propagate id to out neighbors 
>Rest phase: update the colorID with the minimum value seen 
> until convergence
> 4). Backward traversal: 
>  Start: if the vertex id is equal to its color id 
> propagate the value to transposedNeighbors
>  Rest: each vertex that receives a message equal to its 
> colorId will propagate its colorId to the transposed graph and becomes 
> inactive. 
> More info in section 3.1 of this paper: 
> http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf
> or in section 6 of this paper: http://www.vldb.org/pvldb/vol7/p1821-yan.pdf  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Vasia Kalavri (JIRA)

Vasia Kalavri created FLINK-3860:


 Summary: WikipediaEditsSourceTest.testWikipediaEditsSource times 
out
 Key: FLINK-3860
 URL: https://issues.apache.org/jira/browse/FLINK-3860
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on my 
latest travis build.
See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-03 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3793.
--
   Resolution: Fixed
Fix Version/s: 1.1.0

> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.1.0
>
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266206#comment-15266206
 ] 

Vasia Kalavri commented on FLINK-2044:
--

Thanks, I'll assign it to you :)

> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread Vasia Kalavri (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-2044:
-
Assignee: GaoLun

> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2926) Add a Strongly Connected Components Library Method

2016-05-02 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266205#comment-15266205
 ] 

Vasia Kalavri commented on FLINK-2926:
--

Hi [~mliesenberg],
thank you for design document. If I understand correctly, the algorithm 
requires a nested iteration, right?
Inside each round, we need to run 2 pregel steps (forward and backward 
propagation) and a graph decomposition step. Currently, this is not easy to 
express in Flink. Our Pregel abstraction does not support multiple phases and 
does not have a master computation like Giraph does. We will probably need to 
create a custom delta iteration and handle the different phases with 
aggregators. Would you like to give it a try and sketch what this delta 
iteration would look like?

> Add a Strongly Connected Components Library Method
> --
>
> Key: FLINK-2926
> URL: https://issues.apache.org/jira/browse/FLINK-2926
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Martin Liesenberg
>Priority: Minor
>  Labels: requires-design-doc
>
> This algorithm operates in four main steps: 
> 1). Form the transposed graph (each vertex sends its id to its out neighbors 
> which form a transposedNeighbors set)
> 2). Trimming: every vertex which has only incoming or outgoing edges sets 
> colorID to its own value and becomes inactive. 
> 3). Forward traversal: 
>Start phase: propagate id to out neighbors 
>Rest phase: update the colorID with the minimum value seen 
> until convergence
> 4). Backward traversal: 
>  Start: if the vertex id is equal to its color id 
> propagate the value to transposedNeighbors
>  Rest: each vertex that receives a message equal to its 
> colorId will propagate its colorId to the transposed graph and becomes 
> inactive. 
> More info in section 3.1 of this paper: 
> http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf
> or in section 6 of this paper: http://www.vldb.org/pvldb/vol7/p1821-yan.pdf  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 >

1 - 100 of 443 matches

Mail list logo