[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300415#comment-15300415 ] ASF GitHub Bot commented on FLINK-2044: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1956 > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > Fix For: 1.1.0 > > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300409#comment-15300409 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221637781 Thanks @gallenvara! All tests pass now. I will merge :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300238#comment-15300238 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221616393 @vasia In the travis failure reports, the failures are relevant with flink-yarn-tests module. I have merged the latest code from master and rebase all my commit in this PR. And the `HITSAlgorithmITCase` runs successfully in local. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300178#comment-15300178 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221603967 The failures were in the `HITSAlgorithmITCase`. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300177#comment-15300177 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221603165 @vasia in which module were the failures? master has been quite unstable recently now that tests are properly failing. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300162#comment-15300162 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221600182 Hey @gallenvara, I was about to merge this, but I see test failures after rebasing on top of master. Can you please (1) rebase on top of the latest master and squash your commits and (2) investigate what's wrong with the tests? Thank you! > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299667#comment-15299667 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221504600 Thank you @gallenvara. I'll merge later today. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299312#comment-15299312 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221452144 @vasia fixed! > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298855#comment-15298855 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-221394386 Hey @gallenvara, I had a private chat with @greghogan about this PR. We think that we should change the label type to a boolean instead of string. It should make a difference for large graph inputs. After this last change we'll go ahead and finally merge this :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295405#comment-15295405 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64144375 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- @greghogan @vasia In the use of HITS algorithm, bidirectional edges is not very common, so I do not have bidirectional edge determination logic written into the code and all of the edges have joined their reverse edges. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293669#comment-15293669 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220656004 Stanford provides several old graph datasets at https://snap.stanford.edu/data/index.html which might prove a better standard for benchmarking. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293667#comment-15293667 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64070513 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- A boolean is stored as a byte as would a bitmask which allows the edge set to be compressed on bidirectional edges. Not sure how common bidirectional edges are in real-world datasets. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293519#comment-15293519 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220633090 @vasia vertex num: 1, edge num: 3; vertex num: 3, edge num: 10. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293511#comment-15293511 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056511 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- I mean that they had a small difference in running time. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293501#comment-15293501 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056167 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Fixed with rebasing the previous commit. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293499#comment-15293499 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64056076 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- Before committing the newest codes, i have test the edge with `String` label and `boolean` label. And i found that `boolean` label is not fast than the `String` label. So finally i used the "hub" & "authority", also it's very intuitionistic. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293488#comment-15293488 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64055156 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Sorry, this line should be moved. :) In the previous commit without edge label, i use the same edge for hub and authority updating and this line would keep every vertex updating. But in the newest implementation with edge label, we can drop this because the added edges can do the same work. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293473#comment-15293473 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220628229 Thank you for the update @gallenvara and for your patience with our continuous comments :) How many edges did the graphs in your experiment have? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293470#comment-15293470 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64054001 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -273,4 +289,23 @@ public void sendMessages(Vertex> vertex) { return initVertexValue; } } + + public static class AuthorityEdgeMapper implements MapFunction , String> { --- End diff -- Ah when I proposed to label the edges as "authority" or "hub" I didn't really mean to add a `String` label :) We can do this with a boolean. @greghogan what do you think? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293452#comment-15293452 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r64051693 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -243,18 +255,22 @@ public void sendMessages(Vertex> vertex) { iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { - K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); - K messageTarget = getSuperstepNumber() % 2 == 1 ? edge.getTarget() : edge.getSource(); - double messageValue = getSuperstepNumber() % 2 == 1 ? vertex.getValue().f0.getValue() : vertex.getValue().f1.getValue(); - - if (!messageTarget.equals(vertex.getId())) { - if (getSuperstepNumber() != maxIteration) { - sendMessageTo(messageTarget, messageValue / iterationValueSum); - - // in order to make every vertex updated - sendMessageTo(messageSource, 0.0); + if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() % 2 == 1) { + if (edge.getValue().equals("Authority")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f0.getValue() / iterationValueSum); + } } else { - sendMessageTo(messageSource, iterationValueSum); + if (edge.getValue().equals("Hub")) { + sendMessageTo(edge.getTarget(), vertex.getValue().f1.getValue() / iterationValueSum); + } + } + + // make all the vertices be updated + sendMessageTo(edge.getSource(), 0.0); --- End diff -- Why do we need to send this message? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290897#comment-15290897 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220292049 Hi, @vasia @greghogan . Thanks a lot with your advice! I have modify the codes to support the iteration with `EdgeDirection.OUT` instead of `EdgeDirection.ALL` with edge label. And i write a test with big graph to compare the two implementations, the result is: ![hits_compare](https://cloud.githubusercontent.com/assets/12931563/15391189/627af152-1df2-11e6-9e43-33706bd00cdf.PNG) The number of edge is same and the relation between every two nodes is randomly added. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290887#comment-15290887 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63856640 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -122,22 +179,45 @@ public void updateVertex(Vertex> vertex, Mes DoubleValue newAuthorityValue = vertex.getValue().f1; if (getSuperstepNumber() > 1) { - iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("sumVertexValue")).getValue()); + iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } - if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() < maxIteration) { if (getSuperstepNumber() % 2 == 1) { - newHubValue.setValue(newHubValue.getValue() / iterationValueSum); - newAuthorityValue.setValue(updateValue); + + //in the first iteration, the diff is the authority value of each vertex + double previousAuthAverage = 1.0; + double diffValueSum = 1.0 * numberOfVertices; + if (getSuperstepNumber() > 1) { + previousAuthAverage = ((DoubleValue) getPreviousIterationAggregate("authorityValueSum")).getValue() / numberOfVertices; + diffValueSum = ((DoubleValue) getPreviousIterationAggregate("diffValueSum")).getValue(); + } + authoritySumAggregator.aggregate(previousAuthAverage); + + if (diffValueSum > convergeThreshold) { + newHubValue.setValue(newHubValue.getValue() / iterationValueSum); + newAuthorityValue.setValue(updateValue); + } else { + + //scores are converged and stop iteration + maxIteration = getSuperstepNumber(); --- End diff -- This line can stop the iteration after last vertex updating(final updating of hub normalization). If drop this line, the iteration will go on until `getSuperstep == maxIteration` because there are always some vertices can be updated. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290888#comment-15290888 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63856679 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -46,40 +46,89 @@ * represented a page that is linked by many different hubs. * Each vertex has a value of Tuple2 type, the first field is hub score and the second field is authority score. * The implementation assumes that the two score are the same in each vertex at the beginning. + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. * * * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm */ public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289736#comment-15289736 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220147037 I also like @vasia's idea since we might get to add the analytic `DyadicCensus` (basically counting how many edges are `u <-> v` and `u -> v`). The relative performance of this implementation will depend on the ratio of mutual edges. Are there one or more canonical data sets for running a comparison? The two extremes are an undirected graph and a graph with no mutual edges, which could be artificially constructed. Also, I recommend copying the current code into a new class to test a new implementation. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289713#comment-15289713 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220143133 @vasia that is a very good idea. I've been doing this for `TriangleListing` for directed clustering coefficient and it should probably be a separate `GraphAlgorithm`. There is a bitmask for whether the edge is `u -> v`, `u <- v`, or `u <-> v`. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289658#comment-15289658 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-220134968 Thanks for the update @gallenvara and @greghogan for the review. I have left a few comments in places where I couldn't figure out the code logic. There is one more thing I'd like to discuss with you. The current implementation is using `EdgeDirection.ALL`, but this feature is not very efficiently implemented. It actually performs 2 coGroups (one on the source Id and one on the target Id) and unions the outputs to create the result of the messaging function. Instead, I was thinking we could do the following: mark the inputs graph edges with an "authority" label and add opposite-direction edges with a "hub" authority. Then, we can use the default edge direction (IN) to perform the iteration. In even supersteps, each vertex sends messages only along "authority" edges and in odd supersteps only along "hub" edges. Does this make sense? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289639#comment-15289639 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63765556 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -122,22 +179,45 @@ public void updateVertex(Vertex> vertex, Mes DoubleValue newAuthorityValue = vertex.getValue().f1; if (getSuperstepNumber() > 1) { - iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("sumVertexValue")).getValue()); + iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } - if (getSuperstepNumber() != maxIteration) { + if (getSuperstepNumber() < maxIteration) { if (getSuperstepNumber() % 2 == 1) { - newHubValue.setValue(newHubValue.getValue() / iterationValueSum); - newAuthorityValue.setValue(updateValue); + + //in the first iteration, the diff is the authority value of each vertex + double previousAuthAverage = 1.0; + double diffValueSum = 1.0 * numberOfVertices; + if (getSuperstepNumber() > 1) { + previousAuthAverage = ((DoubleValue) getPreviousIterationAggregate("authorityValueSum")).getValue() / numberOfVertices; + diffValueSum = ((DoubleValue) getPreviousIterationAggregate("diffValueSum")).getValue(); + } + authoritySumAggregator.aggregate(previousAuthAverage); + + if (diffValueSum > convergeThreshold) { + newHubValue.setValue(newHubValue.getValue() / iterationValueSum); + newAuthorityValue.setValue(updateValue); + } else { + + //scores are converged and stop iteration + maxIteration = getSuperstepNumber(); --- End diff -- I don't think this assignment has any effect? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289637#comment-15289637 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63765397 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -46,40 +46,89 @@ * represented a page that is linked by many different hubs. * Each vertex has a value of Tuple2 type, the first field is hub score and the second field is authority score. * The implementation assumes that the two score are the same in each vertex at the beginning. + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. * * * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm */ public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289640#comment-15289640 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63765714 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -160,7 +240,7 @@ public void sendMessages(Vertex> vertex) { double iterationValueSum = 1.0; if (getSuperstepNumber() > 1) { - iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("sumVertexValue")).getValue()); + iterationValueSum = Math.sqrt(((DoubleValue) getPreviousIterationAggregate("updatedValueSum")).getValue()); } for (Edge edge : getEdges()) { K messageSource = getSuperstepNumber() % 2 == 1 ? edge.getSource() : edge.getTarget(); --- End diff -- Could you please explain what is this and the following line doing? The condition is the same in both. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289633#comment-15289633 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r63765321 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -46,40 +46,89 @@ * represented a page that is linked by many different hubs. * Each vertex has a value of Tuple2 type, the first field is hub score and the second field is authority score. * The implementation assumes that the two score are the same in each vertex at the beginning. + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. * * * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm */ public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280541#comment-15280541 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-218543446 @greghogan thanks and relevant codes have been modified. :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280280#comment-15280280 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62868637 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represents a page that points to many other pages, and a good authority + * represented a page that is linked by many different hubs. + * Each vertex has a value of Tuple2 type, the first field is hub score and the second field is authority score. + * The implementation assumes that the two score are the same in each vertex at the beginning. + * + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm 0, "The number of maximum iteration should be greater than 0."); + this.maxIterations = maxIterations * 2 + 1; + } + + @Override + public DataSet >> run(Graph netGraph) throws Exception { + + ScatterGatherConfiguration parameter = new ScatterGatherConfiguration(); + parameter.setDirection(EdgeDirection.ALL); + parameter.registerAggregator("sumVertexValue", new DoubleSumAggregator()); + + return netGraph + .mapVertices(new VertexInitMapper ()) + .mapEdges(new NullValueEdgeMapper ()) --- End diff -- Yes, and i will modify the code. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280150#comment-15280150 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-218467237 @vasia Thanks a lot and PR has been updated as you advised. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280047#comment-15280047 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-218446355 Thanks for the updates @gallenvara. I left a few minor comments. Could you please also add the algorithm in the Gelly documentation under "library methods"? It should be good to merge after that :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280042#comment-15280042 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62838798 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ --- End diff -- Can you also please add a comment about the result type? Which tuple field is the authority score and which is the hub? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280040#comment-15280040 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62838726 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on --- End diff -- represented => represents pointed => points was linked => is linked the two value => the two values* > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280037#comment-15280037 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62837591 --- Diff: flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/data/HITSData.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.examples.data; + +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Vertex; +import org.apache.flink.types.NullValue; + +import java.util.ArrayList; +import java.util.List; + +/** + * Provides the default data set used for the HITS test program. + * If no parameters are given to the program, the default edge data set is used. --- End diff -- I guess you copied this comment from another similar class. The "If no parameters given..." refers to Gelly examples, which run with default data if no parameters are provided. In this case HITS is implemented as a library method, so this comment can be removed. This data is only used for testing as far as I can tell :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276496#comment-15276496 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62518520 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276469#comment-15276469 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62514639 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm 0, "The number of maximum iteration should be greater than 0."); + this.maxIterations = maxIterations * 2 + 1; + } + + @Override + public DataSet >> run(Graph netGraph) throws Exception { + + ScatterGatherConfiguration parameter = new ScatterGatherConfiguration(); + parameter.setDirection(EdgeDirection.ALL); + parameter.registerAggregator("sumVertexValue", new DoubleSumAggregator()); + + return netGraph + .mapVertices(new VertexInitMapper ()) + .mapEdges(new NullValueEdgeMapper ()) + .runScatterGatherIteration(new VertexUpdate(maxIterations), + new MessageUpdate (maxIterations), maxIterations, parameter) + .getVertices(); + } + + /** +* Function that updates the value of a vertex by summing up the partial +* values from all messages and normalize the value. +*/ + @SuppressWarnings("serial") + public static final class VertexUpdate extends VertexUpdateFunction , Double> { + private int maxIteration; + private
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276468#comment-15276468 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62514632 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276334#comment-15276334 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62493996 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276328#comment-15276328 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r62493213 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.graph.utils.NullValueEdgeMapper; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.types.NullValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + * + * @see https://en.wikipedia.org/wiki/HITS_algorithm;>HITS Algorithm + */ +public class HITSAlgorithmimplements GraphAlgorithm 0, "The number of maximum iteration should be greater than 0."); + this.maxIterations = maxIterations * 2 + 1; + } + + @Override + public DataSet >> run(Graph netGraph) throws Exception { + + ScatterGatherConfiguration parameter = new ScatterGatherConfiguration(); + parameter.setDirection(EdgeDirection.ALL); + parameter.registerAggregator("sumVertexValue", new DoubleSumAggregator()); + + return netGraph + .mapVertices(new VertexInitMapper ()) + .mapEdges(new NullValueEdgeMapper ()) + .runScatterGatherIteration(new VertexUpdate(maxIterations), + new MessageUpdate (maxIterations), maxIterations, parameter) + .getVertices(); + } + + /** +* Function that updates the value of a vertex by summing up the partial +* values from all messages and normalize the value. +*/ + @SuppressWarnings("serial") + public static final class VertexUpdate extends VertexUpdateFunction , Double> { + private int maxIteration; + private
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274561#comment-15274561 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-217527593 Hi, @vasia @greghogan , - code modified and support to return both hub and authority score as `Tuple2` type now. The implementation will run extra one iteration to normalize the hub value in the end. With scatter-gather being implemented, the GSA version is easy to write. - As for the threshold, the algorithm should check neighboring hub or authority iteration whether there are vertex updating. It's a little difficult and from my view, the` maxIteration` parameter play a similar role with threshold. - I will open another issue for GSA version and relevant tests. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273941#comment-15273941 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-217419465 Hi @gallenvara, @greghogan, - If it's possible to return both the hub and the authority value, I'd prefer that. - GSA iterations allow setting the edge direction as Greg suggested. I'm not sure how much of a difference the combiner would make. Also, we've seen that for some graphs scatter-gather performs better. Personally, I would be fine with a first scatter-gather version of the algorithm. We can run some tests to see whether GSA would be faster later. - I agree that edge values should be internally set to `NullValue` if not used. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272256#comment-15272256 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-217142445 1) Yes, that sounds right. The update function can choose which of Hub or Authority to set. 2) Does `GSAConfiguration.setDirection` work the same as in scatter-gather? 4) The algorithm will be faster if you first set the edge values to `NullValue`. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271841#comment-15271841 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-217070480 Thanks a lot, @greghogan @vasia . My limited understand on the tips your have mentioned : 1) The PR implemented HITS by dividing hub updating into two phases. Value updating and normalization limit the two phased can not be handled in the same `superstep`. IMO, we can cache the hub updating result and send them to next authority iteration and package final authority and hub value as `Tuple2` type to return. What's your opinion on this? 2) GSA does not support for choosing edge direction and each vertex will be updated based on the values of its in-neighbors only. In the implementation, hub updating use the value of target vertex where the edge direction is out, authority updating used the value of the value of source vertex where the edge direction is in. IMO, it does not work for hub updating procedure if used GSA. 3)Yes, vertices have been initialized in the test. It should be better to be set into the algortithm before the first iteration using `Graph.translateVertexValues`. 4) Because the edge value not used, the translation is optional and it can keep its original value and type. 5) Yes, adding threshold may reduce iteration time for the case of small graph with great `maxIteration`. (Does the scatter-gather or GSA have a default threshold to check for no value updating during the iteration?) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271172#comment-15271172 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216961955 @vasia likely has additional opinions and insight for the following ... 1) The algorithm should return both the hub score and authority score. This requires unrolling an additional half-step after the iteration concludes. If we start by computing authority, then alternatively compute hub and authority in the iteration (such that authority is the iteration output), then we need to do one further computation of hub which can be outer-joined with the authority. 2) Would this be better as a GSA algorithm which would use a combiner to reduce the scores? [https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html#iteration-abstractions-comparison] 3) I don't see that the input vertex values are used. If these require an initial type or value (is this mandated by the scatter-gather API?) then we can parameterize the algorithm and translate the vertices to the proper type and/or value using `Graph.translateVertexValues`. 4) Same for edge values, which can be translated to `NullValue`. 5) I'm assuming we can use a convergence threshold. 6) From what I have read the normalization is performed by dividing by root-sum-square. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270061#comment-15270061 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61986359 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithmimplements GraphAlgorithm >> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + Preconditions.checkArgument(maxIterations > 0, "The number of maximum iteration should be greater than 0."); + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + this(maxIterations, hitsParameter); + Preconditions.checkArgument(numberOfVertices > 0, "The number of vertices in graph should be greater than 0."); + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices ==
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269194#comment-15269194 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61925324 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; +import org.apache.flink.util.Preconditions; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithmimplements GraphAlgorithm >> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + Preconditions.checkArgument(maxIterations > 0, "The number of maximum iteration should be greater than 0."); + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + this(maxIterations, hitsParameter); + Preconditions.checkArgument(numberOfVertices > 0, "The number of vertices in graph should be greater than 0."); + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices ==
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268553#comment-15268553 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61868075 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- Done! > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268540#comment-15268540 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61866752 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- yes, you are right. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268535#comment-15268535 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61866283 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- It can be parametrized with "EV" and the algorithm can set it to `NullValue` internally. This way, users won't have to first map their input graphs to `NullValue` edge value types. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268524#comment-15268524 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61865225 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- The edge value is not used throughout the process. It would be better to set to `NullValue` as hard code, IMO. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268518#comment-15268518 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61864334 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- Haven't looked at your latest commit, but you can parameterize with "EV" as you have with "K". > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268085#comment-15268085 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216433392 @greghogan @vasia thanks a lot for your review and codes have been modified. :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267946#comment-15267946 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61832489 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { --- End diff -- may be `this(maxIterations, hitParameter);` :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267901#comment-15267901 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61831147 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices == 0) { + this.numberOfVertices = netGraph.numberOfVertices(); + } + +
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267897#comment-15267897 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61831064 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { --- End diff -- not used, may be good to set to `NullValue` > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267211#comment-15267211 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61780511 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices == 0) { + this.numberOfVertices = netGraph.numberOfVertices(); + } + +
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267206#comment-15267206 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61780238 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices == 0) { + this.numberOfVertices = netGraph.numberOfVertices(); + } + +
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267049#comment-15267049 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61772427 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { --- End diff -- The two phases are depended on each other. Hub can update until authority updated and normalized, also the same to authority. So
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267033#comment-15267033 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61770727 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; --- End diff -- Thanks for the contribution :) Pull requests are always asynchronous. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 >
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267012#comment-15267012 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61769569 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { --- End diff -- Could we return both the hub score and authority score in a `Tuple2` rather than having the user choose between the two scores? >
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266984#comment-15266984 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61767543 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; + } + + @Override + public DataSet > run(Graph netGraph) throws Exception { + if (this.numberOfVertices == 0) { + this.numberOfVertices = netGraph.numberOfVertices(); + } + +
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266974#comment-15266974 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61766847 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; --- End diff -- Thanks for review :) i will modify codes tomorrow because my computer is not by my side now. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL:
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266947#comment-15266947 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61764570 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + this.numberOfVertices = numberOfVertices; --- End diff -- Verify positive `numberOfVertices` with `Preconditions.checkArgument`. Same for `maxIterations` in other constructor. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL:
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266941#comment-15266941 ] ASF GitHub Bot commented on FLINK-2044: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1956#discussion_r61764033 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + +import org.apache.flink.api.common.aggregators.DoubleSumAggregator; +import org.apache.flink.api.java.DataSet; + +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.spargel.MessageIterator; +import org.apache.flink.graph.spargel.MessagingFunction; +import org.apache.flink.graph.spargel.ScatterGatherConfiguration; +import org.apache.flink.graph.spargel.VertexUpdateFunction; +import org.apache.flink.types.DoubleValue; + +/** + * This is an implementation of HITS algorithm, using a scatter-gather iteration. + * The user can define the maximum number of iterations. HITS algorithm is determined by two parameters, + * hubs and authorities. A good hub represented a page that pointed to many other pages, and a good authority + * represented a page that was linked by many different hubs. The implementation assumes that the two value on + * every vertex are the same at the beginning. + * + * If the number of vertices of the input graph is known, it should be provided as a parameter + * to speed up computation. Otherwise, the algorithm will first execute a job to count the vertices. + */ +public class HITSAlgorithm implements GraphAlgorithm>> { + + public static enum HITSParameter { + HUB, + AUTHORITY + } + + private int maxIterations; + private long numberOfVertices; + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is known, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { + this.maxIterations = maxIterations * 2 + 1; + } + } + + /** +* Creates an instance of HITS algorithm. +* If the number of vertices of the input graph is unknown, +* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} constructor instead. +* +* @param maxIterations the maximum number of iterations +* @param hitsParameter the type of final web pages users want to get by this algorithm +*/ + public HITSAlgorithm(int maxIterations, long numberOfVertices, HITSParameter hitsParameter) { + if (hitsParameter == HITSParameter.AUTHORITY) { + this.maxIterations = maxIterations * 2; + } else { --- End diff -- Can replace 82:87 with `super(maxIterations, hitsParameter);`. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266691#comment-15266691 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216251223 @vasia code modified and i add a extra iteration for getting the aggregated value of previous iteration to normalize all vertex. :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266556#comment-15266556 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216227985 I will have a try. If `iteration==1` or `iteration==maxIteration` maybe a little difficult to deal. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266536#comment-15266536 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216226071 There is no way to get the aggregated value of the current superstep, as the aggregation happens at the superstep barrier. What you could do is normalize the vertex value in the `MessagingFunction` of the next superstep. E.g. say in superstep `i` you need to set each vertex value to `v / sum(i)`. You can instead set it to `v` and then propagate `v / sum(i)` in the scatter phase of superstep `i+1`. Would that work? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266517#comment-15266517 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216224456 @vasia i have noticed the `registerAggregator` can aggregate value in distributed environment and i have used it at the very beginning of this issue. But finally i failed because the aggregated value can only be get in the next `superstep`. I want to get the value in `VertexUpdateFunction` of current `superstep`. Can you give me some suggestions? > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266503#comment-15266503 ] ASF GitHub Bot commented on FLINK-2044: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216222300 Hi @gallenvara, thank you for the PR! I only took a quick look, but I will go through the changes in detail soon. One thing I noticed is that you use static variables for performing the normalization. This won't work in a distributed environment. You will need to use aggregators instead. Take a look at `IterationConfiguration.registerAggregator`. `VertexUpdateFunction` and `MessagingFunction` have methods to retrieve an aggregator and an aggregated value from the previous superstep. Let me know if you need help :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266344#comment-15266344 ] ASF GitHub Bot commented on FLINK-2044: --- Github user gallenvara commented on the pull request: https://github.com/apache/flink/pull/1956#issuecomment-216190461 @vasia Can you help with review work? :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266342#comment-15266342 ] ASF GitHub Bot commented on FLINK-2044: --- GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1956 [FLINK-2044] [gelly] Implementation of Gelly HITS Algorithm Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [X] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed ·Implement HITS Algorithm by dividing the update of `hubs` and `authorities` into two processes. ·If users want to find final nice hub pages, they can set the HITSParameter to `HUB`, the same as `AUTHORITY`. ·Use sum normalization to normalize the vertex value. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink HITS_Algorithm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1956 commit 96b4ed1e91d08d992f55b849b451eb496d49b5dd Author: gallenvaraDate: 2016-05-02T09:58:17Z Implementation of HITS algorithm. > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Assignee: GaoLun >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266206#comment-15266206 ] Vasia Kalavri commented on FLINK-2044: -- Thanks, I'll assign it to you :) > Implementation of Gelly HITS Algorithm > -- > > Key: FLINK-2044 > URL: https://issues.apache.org/jira/browse/FLINK-2044 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Ahamd Javid >Priority: Minor > > Implementation of Hits Algorithm in Gelly API using Java. the feature branch > can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566641#comment-14566641 ] Mohammad Fahim Azizi commented on FLINK-2044: - Hi [~vkalavri], as I mentioned above, there is some limitation in vertexUpdate function(changing of edge direction and accessing edges inside this function). so we commit our code where we must use for loop with runVertexCentricIteration function. [The feature branch can be found here. | https://github.com/mfahimazizi/flink/commits/HITS] Implementation of Gelly HITS Algorithm -- Key: FLINK-2044 URL: https://issues.apache.org/jira/browse/FLINK-2044 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Ahamd Javid Priority: Minor Implementation of Hits Algorithm in Gelly API using Java. the feature branch can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562654#comment-14562654 ] Mohammad Fahim Azizi commented on FLINK-2044: - Hey [~vkalavri], the hits algorithm has two phases, hub rule and authority rule. these two phases are must calculate simultaneously. to use runVertexCentricIteration() for this algorithm, we apply following two different approaches but we face with some problems (inside updateVertex function). 1. For switching between hub rule and authority rule phases we used getSuperstepNumber() inside of updateVertex function but problem is updateVertex function doesn't have functionality to change direction of edges inside it. 2. to over come with this problem, we used undirected graph with labeled edges as H and A (mean first we label the edges of directed graph as A then create revese of this graph and label the edges as H and then union). but the problem is accessing of edge values inside updateVertex function. Is there any solution to access edges inside updateVertex function? Implementation of Gelly HITS Algorithm -- Key: FLINK-2044 URL: https://issues.apache.org/jira/browse/FLINK-2044 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Ahamd Javid Priority: Minor Implementation of Hits Algorithm in Gelly API using Java. the feature branch can be found here: (https://github.com/JavidMayar/flink/commits/HITS) -- This message was sent by Atlassian JIRA (v6.3.4#6332)