Hi Miguel, I don't think you're doing anything wrong. The NaN values you are getting are there because of your data. The LCC value is computed as #number_of_triangles / #number_of_triples, where #number_of_triples is [n*(n-1)]/2 for a vertex with n neighbors. It looks like there are no triangles in your graph, and only one vertex has more than one neighbor.
Cheers, -Vasia. On 21 January 2017 at 16:46, Miguel Coimbra <miguel.e.coim...@gmail.com> wrote: > Hello Vasia and Greg, > > Thank you for the feedback! > > I am probably misusing the Gelly API in some way, but I thought I could > run the undirected version after calling getUndirected()? > While not going into the concept of local clustering coefficients, I > thought that from a Gelly API point-of-view, both my code and data set were > properly established. > However: > - I believe that the graph was already undirected; > - I am getting NaN results after executing the algorithm. > > This is the code I am using to obtain an (undirected) graph instance upon > which I call LocalClusteringCoefficient: > > > import org.apache.flink.graph.library.clustering.undirected. > LocalClusteringCoefficient.Result; > import org.apache.flink.graph.library.clustering.undirected. > LocalClusteringCoefficient; > /** other imports and method definitions **/ > > // Generate edge tuples from the input file. > final DataSet<Tuple2<LongValue, LongValue>> edgeTuples = > env.readCsvFile(inputPath) > .fieldDelimiter("\t") // node IDs are separated by spaces > .ignoreComments("#") // comments start with "%" > .types(LongValue.class, LongValue.class); > > // Generate actual Edge<Long, Double> instances. > @SuppressWarnings("serial") > final DataSet<Edge<LongValue, Double>> edges = edgeTuples.map( > new MapFunction<Tuple2<LongValue, LongValue>, Edge<LongValue, > Double>>() { > @Override > public Edge<LongValue, Double> map(Tuple2<LongValue, LongValue> > arg0) throws Exception { > return new Edge<LongValue, Double>(arg0.f0, arg0.f1, 1.0d); > } > }); > > // Generate the basic graph. > @SuppressWarnings("serial") > final Graph<LongValue, Double, Double> graph = Graph.fromDataSet( > edges, > new MapFunction<LongValue, Double>() { > @Override > public Double map(LongValue arg0) throws Exception { > // For testing purposes, just setting each vertex value to 1.0. > return 1.0; > } > }, > env).*getUndirected(*); > > // Execute the LocalClusteringCoefficient algorithm. > final DataSet<Result<LongValue>> localClusteringCoefficients = > graph.run(new LocalClusteringCoefficient<LongValue, Double, Double>()); > > // Get the values as per Vasia's help: > @SuppressWarnings("serial") > DataSet<Double> *CLUSTERING_COEFFICIENTS* = > localClusteringCoefficients.map(new > MapFunction<Result<LongValue>, Double>() { > @Override > public Double map(Result<LongValue> arg0) throws Exception { > return arg0.getLocalClusteringCoefficientScore(); > } > }); > > I believe this is the correct way to get a DataSet<Double> of > coefficients from a DataSet<Result<LongValue>> ? > Among the coefficients are a lot of NaN values: > > *CLUSTERING_COEFFICIENTS*.print(); > > NaN > 0.0 > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > NaN > > Apologies for the verbosity in advance, but just to provide detail, > printing the graph edges yields this (notice that for each pair or vertices > there are two links, which are original and the reverse version derived > from getUndirected()). > > *Greg:* I therefore believe the *graph is undirected*: > > graph.getEdgesAsTuple3().print(); > (5113,6008,1.0) > (6008,5113,1.0) > (5113,6774,1.0) > (6774,5113,1.0) > (5113,32938,1.0) > (32938,5113,1.0) > (5113,6545,1.0) > (6545,5113,1.0) > (5113,7088,1.0) > (7088,5113,1.0) > (5113,37929,1.0) > (37929,5113,1.0) > (5113,26562,1.0) > (26562,5113,1.0) > (5113,6107,1.0) > (6107,5113,1.0) > (5113,7171,1.0) > (7171,5113,1.0) > (5113,6192,1.0) > (6192,5113,1.0) > (5113,7763,1.0) > (7763,5113,1.0) > (9748,5113,1.0) > (5113,9748,1.0) > (10191,5113,1.0) > (5113,10191,1.0) > (6064,5113,1.0) > (5113,6064,1.0) > (6065,5113,1.0) > (5113,6065,1.0) > (6279,5113,1.0) > (5113,6279,1.0) > (4907,5113,1.0) > (5113,4907,1.0) > (6465,5113,1.0) > (5113,6465,1.0) > (6707,5113,1.0) > (5113,6707,1.0) > (7089,5113,1.0) > (5113,7089,1.0) > (7172,5113,1.0) > (5113,7172,1.0) > (14310,5113,1.0) > (5113,14310,1.0) > (6252,5113,1.0) > (5113,6252,1.0) > (33855,5113,1.0) > (5113,33855,1.0) > (7976,5113,1.0) > (5113,7976,1.0) > (26284,5113,1.0) > (5113,26284,1.0) > (8056,5113,1.0) > (5113,8056,1.0) > (10371,5113,1.0) > (5113,10371,1.0) > (16785,5113,1.0) > (5113,16785,1.0) > (19801,5113,1.0) > (5113,19801,1.0) > (6715,5113,1.0) > (5113,6715,1.0) > (31724,5113,1.0) > (5113,31724,1.0) > (32443,5113,1.0) > (5113,32443,1.0) > (10370,5113,1.0) > (5113,10370,1.0) > > Any insight into what I may be doing wrong would be greatly appreciated. > > Thanks for your time, > > Kind regards, > > Miguel E. Coimbra > Email: miguel.e.coim...@gmail.com <miguel.e.coim...@ist.utl.pt> > Skype: miguel.e.coimbra > > On 20 January 2017 at 19:31, Greg Hogan <c...@greghogan.com> wrote: > >> Hi Miguel, >> >> The '--output print' option describes the values and also displays the >> local clustering coefficient value. >> >> You're running the undirected algorithm on a directed graph. In 1.2 there >> is an option '--simplify true' that will add reverse edges and remove >> duplicate edges and self-loops. Alternatively, it looks like you could >> simply add reverse edges to your input file (with an optional ' | sort | >> uniq' following): >> >> $ cat edges.txt | awk ' { print $1, $2; print $2, $1 } ' >> >> The drivers are being reworked for 1.3 to better reuse code and options >> which will better support additional drivers and algorithms and make >> documentation simpler. >> >> Greg >> >> On Fri, Jan 20, 2017 at 2:06 PM, Vasiliki Kalavri < >> vasilikikala...@gmail.com> wrote: >> >>> Hi Miguel, >>> >>> the LocalClusteringCoefficient algorithm returns a DataSet of type >>> Result, which basically wraps a vertex id, its degree, and the number >>> of triangles containing this vertex. The number 11 you see is indeed the >>> degree of vertex 5113. The Result type contains the method >>> getLocalClusteringCoefficientScore() which allows you to retrieve the >>> clustering coefficient score for a vertex. The method simply divides the >>> numbers of triangles by the number of potential edges between neighbors. >>> >>> I'm sorry that you this is not clear in the docs. We should definitely >>> improve them to explain what is the output and how to retrieve the actual >>> clustering coefficient values. I have opened a JIRA for this [1]. >>> >>> Cheers, >>> -Vasia. >>> >>> [1]: https://issues.apache.org/jira/browse/FLINK-5597 >>> >>> On 20 January 2017 at 19:31, Miguel Coimbra <miguel.e.coim...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> In the documentation of the LocalClusteringCoefficient algorithm, it >>>> is said: >>>> >>>> >>>> *The local clustering coefficient measures the connectedness of each >>>> vertex’s neighborhood.Scores range from 0.0 (no edges between neighbors) to >>>> 1.0 (neighborhood is a clique).* >>>> >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.1/ >>>> apis/batch/libs/gelly.html#local-clustering-coefficient >>>> <https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/gelly/library_methods.html#local-clustering-coefficient> >>>> >>>> However, upon running the algorithm (undirected version), I obtained >>>> values above 1. >>>> >>>> The result I got was this. As you can see, vertex 5113 has a score of >>>> 11: >>>> (the input edges for the graph are shown further below - around *35 >>>> edges*): >>>> >>>> (4907,(1,0)) >>>> *(5113,(11,0))* >>>> (6008,(0,0)) >>>> (6064,(1,0)) >>>> (6065,(1,0)) >>>> (6107,(0,0)) >>>> (6192,(0,0)) >>>> (6252,(1,0)) >>>> (6279,(1,0)) >>>> (6465,(1,0)) >>>> (6545,(0,0)) >>>> (6707,(1,0)) >>>> (6715,(1,0)) >>>> (6774,(0,0)) >>>> (7088,(0,0)) >>>> (7089,(1,0)) >>>> (7171,(0,0)) >>>> (7172,(1,0)) >>>> (7763,(0,0)) >>>> (7976,(1,0)) >>>> (8056,(1,0)) >>>> (9748,(1,0)) >>>> (10191,(1,0)) >>>> (10370,(1,0)) >>>> (10371,(1,0)) >>>> (14310,(1,0)) >>>> (16785,(1,0)) >>>> (19801,(1,0)) >>>> (26284,(1,0)) >>>> (26562,(0,0)) >>>> (31724,(1,0)) >>>> (32443,(1,0)) >>>> (32938,(0,0)) >>>> (33855,(1,0)) >>>> (37929,(0,0)) >>>> >>>> This was from a small isolated test with these edges: >>>> >>>> 5113 6008 >>>> 5113 6774 >>>> 5113 32938 >>>> 5113 6545 >>>> 5113 7088 >>>> 5113 37929 >>>> 5113 26562 >>>> 5113 6107 >>>> 5113 7171 >>>> 5113 6192 >>>> 5113 7763 >>>> 9748 5113 >>>> 10191 5113 >>>> 6064 5113 >>>> 6065 5113 >>>> 6279 5113 >>>> 4907 5113 >>>> 6465 5113 >>>> 6707 5113 >>>> 7089 5113 >>>> 7172 5113 >>>> 14310 5113 >>>> 6252 5113 >>>> 33855 5113 >>>> 7976 5113 >>>> 26284 5113 <262%20845%20113> >>>> 8056 5113 >>>> 10371 5113 >>>> 16785 5113 >>>> 19801 5113 >>>> 6715 5113 >>>> 31724 5113 >>>> 32443 5113 >>>> 10370 5113 >>>> >>>> I am not sure what I may be doing wrong, but is there perhaps some form >>>> of normalization lacking in my execution of: >>>> >>>> org.apache.flink.graph.library.clustering.undirected.LocalCl >>>> usteringCoefficient.Result; >>>> org.apache.flink.graph.library.clustering.undirected.LocalCl >>>> usteringCoefficient; >>>> >>>> Am I supposed to divide all scores by the greatest score obtained by >>>> the algorithm? >>>> >>>> Thank you very much! >>>> >>>> Miguel E. Coimbra >>>> Email: miguel.e.coim...@gmail.com <miguel.e.coim...@ist.utl.pt> >>>> Skype: miguel.e.coimbra >>>> >>> >>> >> >