[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example

2015-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742234#comment-14742234
 ] 

ASF GitHub Bot commented on FLINK-2310:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/892#issuecomment-139822550
  
I think it's safe to open a fresh PR... fixing this one would be overkill. 
You can also update and rebase #923 so that we can review them. 


> Add an Adamic-Adar Similarity example
> -
>
> Key: FLINK-2310
> URL: https://issues.apache.org/jira/browse/FLINK-2310
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps: 
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value. 
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Prerequisites: 
> - Full understanding of the Jaccard Similarity Measure algorithm
> - Reading the associated literature: 
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...

2015-09-12 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/892#issuecomment-139822550
  
I think it's safe to open a fresh PR... fixing this one would be overkill. 
You can also update and rebase #923 so that we can review them. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2634) Add a Vertex-centric Version of the Tringle Count Library Method

2015-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742229#comment-14742229
 ] 

ASF GitHub Bot commented on FLINK-2634:
---

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/1105#issuecomment-139820404
  
If it's okay with you, I'd like to see what @vasia has to say about adding 
the Triangle Count example from the DataSet API +  a reduce as a library 
method. IMO, it's a better addition, but for some reason, we preferred the 
Pregel/GSA implementations at a certain point (it was because in my thesis, I 
take Pregel as a baseline). 

Also the generic K instead of Long makes perfect sense. However, if we 
decide to change it, I'll have to open a JIRA to revise the entire set of 
library methods because apart from PageRank, I think they all restrict 
themselves to one common key type. I would be kind of scared of type erasure in 
the K implements Key case :-S 


> Add a Vertex-centric Version of the Tringle Count Library Method
> 
>
> Key: FLINK-2634
> URL: https://issues.apache.org/jira/browse/FLINK-2634
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> The vertex-centric version of this algorithm receives an undirected graph as 
> input and outputs the total number of triangles formed by the graph's edges.
> The implementation consists of three phases:
> 1). Select neighbours with id greater than the current vertex id.
> 2). Propagate each received value to neighbours with higher id. 
> 3). Compute the number of Triangles by verifying if the final vertex contains 
> the sender's id in its list.
> As opposed to the GAS version, all these three steps will be performed via 
> message passing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2634] [gelly] Added a vertex-centric Tr...

2015-09-12 Thread andralungu
Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/1105#issuecomment-139820404
  
If it's okay with you, I'd like to see what @vasia has to say about adding 
the Triangle Count example from the DataSet API +  a reduce as a library 
method. IMO, it's a better addition, but for some reason, we preferred the 
Pregel/GSA implementations at a certain point (it was because in my thesis, I 
take Pregel as a baseline). 

Also the generic K instead of Long makes perfect sense. However, if we 
decide to change it, I'll have to open a JIRA to revise the entire set of 
library methods because apart from PageRank, I think they all restrict 
themselves to one common key type. I would be kind of scared of type erasure in 
the K implements Key case :-S 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2661) Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs

2015-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742216#comment-14742216
 ] 

ASF GitHub Bot commented on FLINK-2661:
---

GitHub user andralungu opened a pull request:

https://github.com/apache/flink/pull/1124

[FLINK-2661] Added Node Splitting Methods

Social media graphs, citation networks or even protein networks have a 
common property: their degree distribution follows a power-law curve. This 
structure raises challenges to both vertex-centric and GSA/GAS models because 
they uniformly process vertices, regardless of their degree distribution. This 
leads to large execution time stalls: vertices wait for skewed nodes to finish 
computation [synchronous]. 

This PR aims to diminish the impact of high-degree nodes by proposing four 
main functions: `determinieSkewedVertices`, `treeDeAggregate` (splits a node 
into subnodes, recursively, in levels), `propagateValuesToSplitVertices` 
(useful when the algorithm performs more than one superstep), `treeAggregate` 
(brings the graph back to its initial state). 

These functions modify a graph at a high-level, making its degree 
distribution more uniform. The method does not need any special partitioning or 
runtime modification and (for skewed networks and computationally intensive 
algorithms) can speed up the run time by a factor of two. 

I added an example: NodeSplittingJaccardSimilarityMeasure, for which I 
needed to split the overall sequence of operations to two functions to be able 
to test the result. Calling the entire main method would have resulted in the 
"Two few memory segments etc" exception - too many operations called within one 
test, in other words. 

For more info, please consult the additional entry in the documentation. 

If we reach a common point and this PR gets merged, I will also follow 
@fhueske 's suggestion from the mailing list - adding a Split version of each 
of the library methods to allow users to verify whether their run time 
improves. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andralungu/flink splitJaccardFlink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1124


commit b02d0917edcd5f3c8846fe01044afd7444a58c08
Author: Andra Lungu 
Date:   2015-09-12T10:25:20Z

[FLINK-2661] Added Node Splitting Methods

[FLINK-2661] Minor modifications in the docs

[FLINK-2661] pdf to png




> Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
> ---
>
> Key: FLINK-2661
> URL: https://issues.apache.org/jira/browse/FLINK-2661
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>
> Skewed graphs raise unique challenges to computation models such as Gelly's 
> vertex-centric or GSA iterations. This is mainly because of the fact that 
> these approaches uniformly process vertices regardless of their degree 
> distribution. 
> In vertex-centric, for instance, a skewed node will take more time to process 
> its neighbors compared to the other nodes in the graph. The first will act as 
> a straggler causing the latter to remain idle until it finishes its 
> computation. 
> This issue can be mitigated by splitting a high-degree node into subnodes and 
> evenly distributing the edges to the the resulted subvertices. The 
> computation will then be performed on the split vertex. 
> To this end, we should add a Splitting API on top of Gelly which can help:
> - determine skewed nodes 
> - split them
> - merge them back at the end of the computation, given a user defined 
> combiner.
> To illustrate the usage of these methods, we should add an example as well as 
> a separate entry in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2661] Added Node Splitting Methods

2015-09-12 Thread andralungu
GitHub user andralungu opened a pull request:

https://github.com/apache/flink/pull/1124

[FLINK-2661] Added Node Splitting Methods

Social media graphs, citation networks or even protein networks have a 
common property: their degree distribution follows a power-law curve. This 
structure raises challenges to both vertex-centric and GSA/GAS models because 
they uniformly process vertices, regardless of their degree distribution. This 
leads to large execution time stalls: vertices wait for skewed nodes to finish 
computation [synchronous]. 

This PR aims to diminish the impact of high-degree nodes by proposing four 
main functions: `determinieSkewedVertices`, `treeDeAggregate` (splits a node 
into subnodes, recursively, in levels), `propagateValuesToSplitVertices` 
(useful when the algorithm performs more than one superstep), `treeAggregate` 
(brings the graph back to its initial state). 

These functions modify a graph at a high-level, making its degree 
distribution more uniform. The method does not need any special partitioning or 
runtime modification and (for skewed networks and computationally intensive 
algorithms) can speed up the run time by a factor of two. 

I added an example: NodeSplittingJaccardSimilarityMeasure, for which I 
needed to split the overall sequence of operations to two functions to be able 
to test the result. Calling the entire main method would have resulted in the 
"Two few memory segments etc" exception - too many operations called within one 
test, in other words. 

For more info, please consult the additional entry in the documentation. 

If we reach a common point and this PR gets merged, I will also follow 
@fhueske 's suggestion from the mailing list - adding a Split version of each 
of the library methods to allow users to verify whether their run time 
improves. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andralungu/flink splitJaccardFlink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1124


commit b02d0917edcd5f3c8846fe01044afd7444a58c08
Author: Andra Lungu 
Date:   2015-09-12T10:25:20Z

[FLINK-2661] Added Node Splitting Methods

[FLINK-2661] Minor modifications in the docs

[FLINK-2661] pdf to png




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."

2015-09-12 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2662:
--

 Summary: CompilerException: "Bug: Plan generation for Unions 
picked a ship strategy between binary plan operators."
 Key: FLINK-2662
 URL: https://issues.apache.org/jira/browse/FLINK-2662
 Project: Flink
  Issue Type: Bug
  Components: Optimizer
Affects Versions: master
Reporter: Gabor Gevay


I have a Flink program which throws the exception in the jira title. Full text:


Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: 
Plan generation for Unions picked a ship strategy between binary plan operators.
at 
org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113)
at 
org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72)
at 
org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41)
at 
org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170)
at 
org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
at 
org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163)
at 
org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163)
at 
org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194)
at 
org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49)
at 
org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41)
at 
org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162)
at 
org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
at 
org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
at 
org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127)
at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520)
at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402)
at 
org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202)
at 
org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalEnvironment.java:63)
at malom.Solver.main(Solver.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


The execution plan:
http://compalg.inf.elte.hu/~ggevay/flink/plan_3_4_0_0_without_verif.txt
(I obtained this by commenting out the line that throws the exception)


The code is here:
https://github.com/ggevay/flink/tree/plan-generation-bug
The class to run is "Solver". It needs a command line argument, which is a 
directory where it would write output. (On first run, it generates some 
lookuptables for a few minutes, which are then placed to /tmp/movegen)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2537) Add scala examples.jar to build-target/examples

2015-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742087#comment-14742087
 ] 

ASF GitHub Bot commented on FLINK-2537:
---

GitHub user chenliang613 opened a pull request:

https://github.com/apache/flink/pull/1123

[FLINK-2537] Add scala examples.jar to build-target/examples

Currently Scala as functional programming language has been acknowledged by 
more and more developers, some starters may want to modify scala examples' code 
for further understanding flink mechanism. After changing scala code,they may 
select the below steps to check result: 
1.go to "build-target/bin" start server
2.use web UI to upload scala examples' jar
3.this time they would get confusion, why changes would be not updated.
Because build-target/examples only copy java examples, suggest adding scala 
examples also.
The new directory would like this :
build-target/examples/java
build-target/examples/scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/flink FLINK-2537

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1123


commit 5a6735ce8cdf1555714377dc94f6ad90c41673c3
Author: chenliang613 
Date:   2015-09-12T15:18:10Z

FLINK-2537 Add scala examples.jar to build-target/examples




> Add scala examples.jar to build-target/examples
> ---
>
> Key: FLINK-2537
> URL: https://issues.apache.org/jira/browse/FLINK-2537
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.10
>Reporter: chenliang
>Assignee: chenliang
>Priority: Minor
>  Labels: maven
> Fix For: 0.10
>
>
> Currently Scala as functional programming language has been acknowledged  by 
> more and more developers,  some starters may want to modify scala examples' 
> code for further understanding flink mechanism. After changing scala 
> code,they may select this method to check result: 
> 1.go to "build-target/bin" start server
> 2.use web UI to upload scala examples' jar
> 3.this time they would get confusion, why changes would be not updated.
> Because build-target/examples only copy java examples, suggest adding scala 
> examples also.
> The new directory would like this :
> build-target/examples/java
> build-target/examples/scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2537] Add scala examples.jar to build-t...

2015-09-12 Thread chenliang613
GitHub user chenliang613 opened a pull request:

https://github.com/apache/flink/pull/1123

[FLINK-2537] Add scala examples.jar to build-target/examples

Currently Scala as functional programming language has been acknowledged by 
more and more developers, some starters may want to modify scala examples' code 
for further understanding flink mechanism. After changing scala code,they may 
select the below steps to check result: 
1.go to "build-target/bin" start server
2.use web UI to upload scala examples' jar
3.this time they would get confusion, why changes would be not updated.
Because build-target/examples only copy java examples, suggest adding scala 
examples also.
The new directory would like this :
build-target/examples/java
build-target/examples/scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/flink FLINK-2537

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1123


commit 5a6735ce8cdf1555714377dc94f6ad90c41673c3
Author: chenliang613 
Date:   2015-09-12T15:18:10Z

FLINK-2537 Add scala examples.jar to build-target/examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2659) Object reuse in UnionWithTempOperator

2015-09-12 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742076#comment-14742076
 ] 

Stephan Ewen commented on FLINK-2659:
-

Yes, this clearly looks like a bug.

Thank you for so thoroughly checking out the object-reuse mode...

> Object reuse in UnionWithTempOperator
> -
>
> Key: FLINK-2659
> URL: https://issues.apache.org/jira/browse/FLINK-2659
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: master
>Reporter: Greg Hogan
>
> The first loop in UnionWithTempOperator.run() executes until null, then the 
> second loop attempts to reuse this null value. [~StephanEwen], would you like 
> me to submit a pull request?
> Stack trace:
> {noformat}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
>   at org.apache.flink.client.program.Client.run(Client.java:381)
>   at org.apache.flink.client.program.Client.run(Client.java:319)
>   at org.apache.flink.client.program.Client.run(Client.java:312)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:790)
>   at Driver.main(Driver.java:376)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>   at org.apache.flink.client.program.Client.run(Client.java:278)
>   at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:630)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:318)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:953)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1003)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:418)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:40)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:104)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
>   at 
> org.apache.flink.runtime.plugable.ReusingDeserializationDelegate.read(ReusingDeserializationDelegate.java:57)
>   at 
> org.apache.flink.

[jira] [Created] (FLINK-2661) Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs

2015-09-12 Thread Andra Lungu (JIRA)
Andra Lungu created FLINK-2661:
--

 Summary: Add a Node Splitting Technique to Overcome the 
Limitations of Skewed Graphs
 Key: FLINK-2661
 URL: https://issues.apache.org/jira/browse/FLINK-2661
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.10
Reporter: Andra Lungu
Assignee: Andra Lungu


Skewed graphs raise unique challenges to computation models such as Gelly's 
vertex-centric or GSA iterations. This is mainly because of the fact that these 
approaches uniformly process vertices regardless of their degree distribution. 

In vertex-centric, for instance, a skewed node will take more time to process 
its neighbors compared to the other nodes in the graph. The first will act as a 
straggler causing the latter to remain idle until it finishes its computation. 

This issue can be mitigated by splitting a high-degree node into subnodes and 
evenly distributing the edges to the the resulted subvertices. The computation 
will then be performed on the split vertex. 

To this end, we should add a Splitting API on top of Gelly which can help:
- determine skewed nodes 
- split them
- merge them back at the end of the computation, given a user defined combiner.

To illustrate the usage of these methods, we should add an example as well as a 
separate entry in the documentation.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Parameter Server: Distributed Key-Value store,...

2015-09-12 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1003#issuecomment-139742195
  
Having interfaces for a Parameter Server service in Flink is a very good 
idea, IMO. This interface can be implemented for different backends, such as 
Ignite or an own lightweight implementation.

However, I doubt that it really necessary to bake the Parameter Server 
master into the JobManager. Can't this be a completely stand-alone service to 
which Flink programs write to and read from via the provided interfaces?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2557] TypeExtractor properly returns Mi...

2015-09-12 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1045#issuecomment-139741907
  
I am not familiar with the TypeExtractor in detail, but would support to 
fix the bug first and open a separate issue to refactor the extractor, if that 
is possible.
@StephanEwen, @tillrohrmann What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2557) Manual type information via "returns" fails in DataSet API

2015-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741981#comment-14741981
 ] 

ASF GitHub Bot commented on FLINK-2557:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1045#issuecomment-139741907
  
I am not familiar with the TypeExtractor in detail, but would support to 
fix the bug first and open a separate issue to refactor the extractor, if that 
is possible.
@StephanEwen, @tillrohrmann What do you think?


> Manual type information via "returns" fails in DataSet API
> --
>
> Key: FLINK-2557
> URL: https://issues.apache.org/jira/browse/FLINK-2557
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Reporter: Matthias J. Sax
>Assignee: Chesnay Schepler
>
> I changed the WordCount example as below and get an exception:
> Tokenizer is change to this (removed generics and added cast to String):
> {code:java}
> public static final class Tokenizer implements FlatMapFunction {
>   public void flatMap(Object value, Collector out) {
>   String[] tokens = ((String) value).toLowerCase().split("\\W+");
>   for (String token : tokens) {
>   if (token.length() > 0) {
>   out.collect(new Tuple2(token, 
> 1));
>   }
>   }
>   }
> }
> {code}
> I added call to "returns()" here:
> {code:java}
> DataSet> counts =
>   text.flatMap(new Tokenizer()).returns("Tuple2")
>   .groupBy(0).sum(1);
> {code}
> The exception is:
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: The types of 
> the interface org.apache.flink.api.common.functions.FlatMapFunction could not 
> be inferred. Support for synthetic interfaces, lambdas, and generic types is 
> limited at this point.
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:686)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.getParameterTypeFromGenericType(TypeExtractor.java:710)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.getParameterType(TypeExtractor.java:673)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:365)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:279)
>   at 
> org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:120)
>   at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:262)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:69)
> {noformat}
> Fix:
> This should not immediately fail, but also only give a "MissingTypeInfo" so 
> that type hints would work.
> The error message is also wrong, btw: It should state that raw types are not 
> supported.
> The issue has been reported here: 
> http://stackoverflow.com/questions/32122495/stuck-with-type-hints-in-clojure-for-generic-class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)