Miguel and Vasia, My thought is to change the example drivers to "print" verbose strings to the console, for example: Vertex ID: 0, vertex degree: 42, triangle count: 7, local clustering coefficient: 0.00406504
Output to CSV will still be the compact tuple representations which do not include derived scores. Also, getUndirected only creates a flipped duplicate of each edge. There are directed and undirected "Simplify" algorithms that also remove the duplicates. Greg On Mon, Jan 23, 2017 at 5:40 AM, Vasiliki Kalavri <vasilikikala...@gmail.com > wrote: > Hi Miguel, > > I don't think you're doing anything wrong. The NaN values you are getting > are there because of your data. The LCC value is computed as > #number_of_triangles / #number_of_triples, where #number_of_triples is > [n*(n-1)]/2 for a vertex with n neighbors. It looks like there are no > triangles in your graph, and only one vertex has more than one neighbor. > > Cheers, > -Vasia. > > On 21 January 2017 at 16:46, Miguel Coimbra <miguel.e.coim...@gmail.com> > wrote: > >> Hello Vasia and Greg, >> >> Thank you for the feedback! >> >> I am probably misusing the Gelly API in some way, but I thought I could >> run the undirected version after calling getUndirected()? >> While not going into the concept of local clustering coefficients, I >> thought that from a Gelly API point-of-view, both my code and data set were >> properly established. >> However: >> - I believe that the graph was already undirected; >> - I am getting NaN results after executing the algorithm. >> >> This is the code I am using to obtain an (undirected) graph instance upon >> which I call LocalClusteringCoefficient: >> >> >> import org.apache.flink.graph.library.clustering.undirected.LocalCl >> usteringCoefficient.Result; >> import org.apache.flink.graph.library.clustering.undirected.LocalCl >> usteringCoefficient; >> /** other imports and method definitions **/ >> >> // Generate edge tuples from the input file. >> final DataSet<Tuple2<LongValue, LongValue>> edgeTuples = >> env.readCsvFile(inputPath) >> .fieldDelimiter("\t") // node IDs are separated by spaces >> .ignoreComments("#") // comments start with "%" >> .types(LongValue.class, LongValue.class); >> >> // Generate actual Edge<Long, Double> instances. >> @SuppressWarnings("serial") >> final DataSet<Edge<LongValue, Double>> edges = edgeTuples.map( >> new MapFunction<Tuple2<LongValue, LongValue>, Edge<LongValue, >> Double>>() { >> @Override >> public Edge<LongValue, Double> map(Tuple2<LongValue, LongValue> >> arg0) throws Exception { >> return new Edge<LongValue, Double>(arg0.f0, arg0.f1, 1.0d); >> } >> }); >> >> // Generate the basic graph. >> @SuppressWarnings("serial") >> final Graph<LongValue, Double, Double> graph = Graph.fromDataSet( >> edges, >> new MapFunction<LongValue, Double>() { >> @Override >> public Double map(LongValue arg0) throws Exception { >> // For testing purposes, just setting each vertex value to >> 1.0. >> return 1.0; >> } >> }, >> env).*getUndirected(*); >> >> // Execute the LocalClusteringCoefficient algorithm. >> final DataSet<Result<LongValue>> localClusteringCoefficients = >> graph.run(new LocalClusteringCoefficient<LongValue, Double, Double>()); >> >> // Get the values as per Vasia's help: >> @SuppressWarnings("serial") >> DataSet<Double> *CLUSTERING_COEFFICIENTS* = >> localClusteringCoefficients.map(new MapFunction<Result<LongValue>, >> Double>() { >> @Override >> public Double map(Result<LongValue> arg0) throws Exception { >> return arg0.getLocalClusteringCoefficientScore(); >> } >> }); >> >> I believe this is the correct way to get a DataSet<Double> of >> coefficients from a DataSet<Result<LongValue>> ? >> Among the coefficients are a lot of NaN values: >> >> *CLUSTERING_COEFFICIENTS*.print(); >> >> NaN >> 0.0 >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> NaN >> >> Apologies for the verbosity in advance, but just to provide detail, >> printing the graph edges yields this (notice that for each pair or vertices >> there are two links, which are original and the reverse version derived >> from getUndirected()). >> >> *Greg:* I therefore believe the *graph is undirected*: >> >> graph.getEdgesAsTuple3().print(); >> (5113,6008,1.0) >> (6008,5113,1.0) >> (5113,6774,1.0) >> (6774,5113,1.0) >> (5113,32938,1.0) >> (32938,5113,1.0) >> (5113,6545,1.0) >> (6545,5113,1.0) >> (5113,7088,1.0) >> (7088,5113,1.0) >> (5113,37929,1.0) >> (37929,5113,1.0) >> (5113,26562,1.0) >> (26562,5113,1.0) >> (5113,6107,1.0) >> (6107,5113,1.0) >> (5113,7171,1.0) >> (7171,5113,1.0) >> (5113,6192,1.0) >> (6192,5113,1.0) >> (5113,7763,1.0) >> (7763,5113,1.0) >> (9748,5113,1.0) >> (5113,9748,1.0) >> (10191,5113,1.0) >> (5113,10191,1.0) >> (6064,5113,1.0) >> (5113,6064,1.0) >> (6065,5113,1.0) >> (5113,6065,1.0) >> (6279,5113,1.0) >> (5113,6279,1.0) >> (4907,5113,1.0) >> (5113,4907,1.0) >> (6465,5113,1.0) >> (5113,6465,1.0) >> (6707,5113,1.0) >> (5113,6707,1.0) >> (7089,5113,1.0) >> (5113,7089,1.0) >> (7172,5113,1.0) >> (5113,7172,1.0) >> (14310,5113,1.0) >> (5113,14310,1.0) >> (6252,5113,1.0) >> (5113,6252,1.0) >> (33855,5113,1.0) >> (5113,33855,1.0) >> (7976,5113,1.0) >> (5113,7976,1.0) >> (26284,5113,1.0) >> (5113,26284,1.0) >> (8056,5113,1.0) >> (5113,8056,1.0) >> (10371,5113,1.0) >> (5113,10371,1.0) >> (16785,5113,1.0) >> (5113,16785,1.0) >> (19801,5113,1.0) >> (5113,19801,1.0) >> (6715,5113,1.0) >> (5113,6715,1.0) >> (31724,5113,1.0) >> (5113,31724,1.0) >> (32443,5113,1.0) >> (5113,32443,1.0) >> (10370,5113,1.0) >> (5113,10370,1.0) >> >> Any insight into what I may be doing wrong would be greatly appreciated. >> >> Thanks for your time, >> >> Kind regards, >> >> Miguel E. Coimbra >> Email: miguel.e.coim...@gmail.com <miguel.e.coim...@ist.utl.pt> >> Skype: miguel.e.coimbra >> >> On 20 January 2017 at 19:31, Greg Hogan <c...@greghogan.com> wrote: >> >>> Hi Miguel, >>> >>> The '--output print' option describes the values and also displays the >>> local clustering coefficient value. >>> >>> You're running the undirected algorithm on a directed graph. In 1.2 >>> there is an option '--simplify true' that will add reverse edges and remove >>> duplicate edges and self-loops. Alternatively, it looks like you could >>> simply add reverse edges to your input file (with an optional ' | sort | >>> uniq' following): >>> >>> $ cat edges.txt | awk ' { print $1, $2; print $2, $1 } ' >>> >>> The drivers are being reworked for 1.3 to better reuse code and options >>> which will better support additional drivers and algorithms and make >>> documentation simpler. >>> >>> Greg >>> >>> On Fri, Jan 20, 2017 at 2:06 PM, Vasiliki Kalavri < >>> vasilikikala...@gmail.com> wrote: >>> >>>> Hi Miguel, >>>> >>>> the LocalClusteringCoefficient algorithm returns a DataSet of type >>>> Result, which basically wraps a vertex id, its degree, and the number >>>> of triangles containing this vertex. The number 11 you see is indeed the >>>> degree of vertex 5113. The Result type contains the method >>>> getLocalClusteringCoefficientScore() which allows you to retrieve the >>>> clustering coefficient score for a vertex. The method simply divides the >>>> numbers of triangles by the number of potential edges between neighbors. >>>> >>>> I'm sorry that you this is not clear in the docs. We should definitely >>>> improve them to explain what is the output and how to retrieve the actual >>>> clustering coefficient values. I have opened a JIRA for this [1]. >>>> >>>> Cheers, >>>> -Vasia. >>>> >>>> [1]: https://issues.apache.org/jira/browse/FLINK-5597 >>>> >>>> On 20 January 2017 at 19:31, Miguel Coimbra <miguel.e.coim...@gmail.com >>>> > wrote: >>>> >>>>> Hello, >>>>> >>>>> In the documentation of the LocalClusteringCoefficient algorithm, it >>>>> is said: >>>>> >>>>> >>>>> *The local clustering coefficient measures the connectedness of each >>>>> vertex’s neighborhood.Scores range from 0.0 (no edges between neighbors) >>>>> to >>>>> 1.0 (neighborhood is a clique).* >>>>> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.1/ >>>>> apis/batch/libs/gelly.html#local-clustering-coefficient >>>>> <https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/gelly/library_methods.html#local-clustering-coefficient> >>>>> >>>>> However, upon running the algorithm (undirected version), I obtained >>>>> values above 1. >>>>> >>>>> The result I got was this. As you can see, vertex 5113 has a score of >>>>> 11: >>>>> (the input edges for the graph are shown further below - around *35 >>>>> edges*): >>>>> >>>>> (4907,(1,0)) >>>>> *(5113,(11,0))* >>>>> (6008,(0,0)) >>>>> (6064,(1,0)) >>>>> (6065,(1,0)) >>>>> (6107,(0,0)) >>>>> (6192,(0,0)) >>>>> (6252,(1,0)) >>>>> (6279,(1,0)) >>>>> (6465,(1,0)) >>>>> (6545,(0,0)) >>>>> (6707,(1,0)) >>>>> (6715,(1,0)) >>>>> (6774,(0,0)) >>>>> (7088,(0,0)) >>>>> (7089,(1,0)) >>>>> (7171,(0,0)) >>>>> (7172,(1,0)) >>>>> (7763,(0,0)) >>>>> (7976,(1,0)) >>>>> (8056,(1,0)) >>>>> (9748,(1,0)) >>>>> (10191,(1,0)) >>>>> (10370,(1,0)) >>>>> (10371,(1,0)) >>>>> (14310,(1,0)) >>>>> (16785,(1,0)) >>>>> (19801,(1,0)) >>>>> (26284,(1,0)) >>>>> (26562,(0,0)) >>>>> (31724,(1,0)) >>>>> (32443,(1,0)) >>>>> (32938,(0,0)) >>>>> (33855,(1,0)) >>>>> (37929,(0,0)) >>>>> >>>>> This was from a small isolated test with these edges: >>>>> >>>>> 5113 6008 >>>>> 5113 6774 >>>>> 5113 32938 >>>>> 5113 6545 >>>>> 5113 7088 >>>>> 5113 37929 >>>>> 5113 26562 >>>>> 5113 6107 >>>>> 5113 7171 >>>>> 5113 6192 >>>>> 5113 7763 >>>>> 9748 5113 >>>>> 10191 5113 >>>>> 6064 5113 >>>>> 6065 5113 >>>>> 6279 5113 >>>>> 4907 5113 >>>>> 6465 5113 >>>>> 6707 5113 >>>>> 7089 5113 >>>>> 7172 5113 >>>>> 14310 5113 >>>>> 6252 5113 >>>>> 33855 5113 >>>>> 7976 5113 >>>>> 26284 5113 <262%20845%20113> >>>>> 8056 5113 >>>>> 10371 5113 >>>>> 16785 5113 >>>>> 19801 5113 >>>>> 6715 5113 >>>>> 31724 5113 >>>>> 32443 5113 >>>>> 10370 5113 >>>>> >>>>> I am not sure what I may be doing wrong, but is there perhaps some >>>>> form of normalization lacking in my execution of: >>>>> >>>>> org.apache.flink.graph.library.clustering.undirected.LocalCl >>>>> usteringCoefficient.Result; >>>>> org.apache.flink.graph.library.clustering.undirected.LocalCl >>>>> usteringCoefficient; >>>>> >>>>> Am I supposed to divide all scores by the greatest score obtained by >>>>> the algorithm? >>>>> >>>>> Thank you very much! >>>>> >>>>> Miguel E. Coimbra >>>>> Email: miguel.e.coim...@gmail.com <miguel.e.coim...@ist.utl.pt> >>>>> Skype: miguel.e.coimbra >>>>> >>>> >>>> >>> >> >