[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602505#comment-15602505 ] ASF GitHub Bot commented on FLINK-4204: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2670 > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Greg Hogan > Fix For: 1.2.0 > > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595767#comment-15595767 ] ASF GitHub Bot commented on FLINK-4204: --- Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2670 I pushed a commit to remove the `GraphMetrics` example. I think providing drivers for all library methods is both desirable and ambitious. If we like the form and functionality of the current drivers then I'd like to look at consolidating common functionality where possible. We may also be able to put multiple similar algorithms like `JaccardIndex` / `AdamicAdar` / `CommonNeighbors` into the same driver. I had first removed `TriangleListing` as it's not an algorithm but I added it back due to Facebook's recent benchmarking: https://code.facebook.com/posts/319004238457019/a-comparison-of-state-of-the-art-graph-processing-systems > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Greg Hogan > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595656#comment-15595656 ] ASF GitHub Bot commented on FLINK-4204: --- Github user vasia commented on the issue: https://github.com/apache/flink/pull/2670 Hi @greghogan, I really like the cleanup and new organization! Two thoughts: - is the plan to add drivers for all library methods? - shall we remove the `GraphMetrics` example since there is a better driver? > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Greg Hogan > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593069#comment-15593069 ] ASF GitHub Bot commented on FLINK-4204: --- GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/2670 [FLINK-4204] [gelly] Clean up gelly-examples Moves drivers into separate package. Adds default main class to print usage listing included classes. Includes documentation for running Gelly examples. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 4204_clean_up_gelly_examples Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2670.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2670 commit 642267c70f362ce5414838aaddbed0dcd6b60934 Author: Greg HoganDate: 2016-08-24T15:32:43Z [FLINK-4204] [gelly] Clean up gelly-examples Moves drivers into separate package. Adds default main class to print usage listing included classes. Includes documentation for running Gelly examples. > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri >Assignee: Greg Hogan > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376917#comment-15376917 ] Greg Hogan commented on FLINK-4204: --- Like Steve Jobs' iPods, I think Gelly should ship with charged batteries [https://www.ted.com/talks/tony_fadell_the_first_secret_of_design_is_noticing/transcript]. We should make it as easy as possible for new users (not necessarily developers) to run algorithms on data and to perceive the power of Flink. We can also continue to refactor and condense the drivers to reduce the lines of code. I hadn't done this because 1) we're still settling on the standard functionality and 2) there is more functionality to be added, such as edge weights. I do question, is TriangleListing a useful standalone algorithm? For counting triangles ClusteringCoefficient can be used. > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376834#comment-15376834 ] Vasia Kalavri commented on FLINK-4204: -- Separating drivers and examples sounds like a good idea. Do you think we should add drivers for every library algorithm? Isn't it enough to provide 1-2 examples and have good documentation about how users can write their own drivers? > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374877#comment-15374877 ] Greg Hogan commented on FLINK-4204: --- We should also add a {{Main-Class}} to {{flink-gelly-examples}} to print usage for running the drivers and example programs. Currently the only means to discover the available classes are to read the source or list classes in the jar. > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374064#comment-15374064 ] Greg Hogan commented on FLINK-4204: --- I think there is a strong case for providing both 1) drivers and 2) examples. The drivers are a nice way to kick the tires, to run the algorithms on actual data, and as examples for using the library methods. The example algorithms, as you note, illustrate the APIs. The {{provided}} scoping that was discussed in February forced the executable code into the separate examples module. I think it would be helpful to namespace the drivers into {{o.a.f.graph.examples.driver}}. Also, to provide some documentation under "Using Gelly" for running a job. It's nice to consolidate algorithms where possible, for example ClusteringCoefficient performs both local and global for directed and undirected. I like seeing three variants of, for example, SSSP as the comparison makes a useful example. I'd prefer to clean these up a little so that the examples demonstrate performant code and out-of-the-box can run on a large data set. > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4204) Clean up gelly-examples
[ https://issues.apache.org/jira/browse/FLINK-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373767#comment-15373767 ] Vasia Kalavri commented on FLINK-4204: -- [~greghogan] let me know what you think! > Clean up gelly-examples > --- > > Key: FLINK-4204 > URL: https://issues.apache.org/jira/browse/FLINK-4204 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > > The gelly-examples has grown quite big (14 examples) and contains several > examples that illustrate the same functionality. Examples should help users > understand how to use the API and ideally show how to use 1-2 features. > Also, it is helpful to state the purpose of each example in the comments. > We should keep the example set small and move everything that does not fit > there to the library. > I propose to remove the following: > - ClusteringCoefficient: the functionality already exists as a library method. > - HITS: the functionality already exists as a library method. > - JaccardIndex: the functionality already exists as a library method. > - SingleSourceShortestPaths: the example shows how to use scatter-gather > iterations. HITSAlgorithm shows the same feature plus the use of aggregators. > I propose we keep this one instead. > - TriangleListing: the functionality already exists as a library method -- This message was sent by Atlassian JIRA (v6.3.4#6332)