Re: GraphX Pagerank application

2014-08-15 Thread Ankur Dave
On Wed, Aug 6, 2014 at 11:37 AM, AlexanderRiggers 
alexander.rigg...@gmail.com wrote:

 To perform the page rank I have to create a graph object, adding the edges
 by setting sourceID=id and distID=brand. In GraphLab there is function: g =
 SGraph().add_edges(data, src_field='id', dst_field='brand')

 Is there something similar in GraphX?


It sounds like you're trying to parse an edge list file into a graph, where
each line is a comma-separated pair of numeric vertex ids. There's a
built-in parser for tab-separated pairs (see GraphLoader) and it should be
easy to adapt that to comma-separated pairs. You can also drop the header
line using RDD#filter (and eventually using
https://github.com/apache/spark/pull/1839).

Ankur http://www.ankurdave.com/


GraphX Pagerank application

2014-08-06 Thread AlexanderRiggers
I want to use pagerank on a 3GB textfile, which contains a bipartite list
with variables id and brand. 

Example:
id,brand
86246,15343
86246,27873
86246,14647
86246,55172
86246,3293
86246,2820
86246,3830
86246,2820
86246,5603
86246,72482

To perform the page rank I have to create a graph object, adding the edges
by setting sourceID=id and distID=brand. In GraphLab there is function: g =
SGraph().add_edges(data, src_field='id', dst_field='brand')

Is there something similar in GraphX? In the GraphX docs there is an example
where a separate edgelist and usernames are joined, but I couldn't find a
use case for my problem.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Pagerank-application-tp11562.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org