Hey all,
I want to load a parquet containing my edges into an Graph my code so far
looks like this:

val edgesDF = spark.read.parquet("/path/to/edges/parquet/")
val edgesRDD = edgesDF.rdd
val graph = Graph.fromEdgeTuples(edgesRDD, 1)

But simply this produces an error:

[error]  found   :
org.apache.spark.rdd.RDD[org.apache.spark.sql.Row][error]  required:
org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId,
org.apache.spark.graphx.VertexId)][error]     (which expands to)
org.apache.spark.rdd.RDD[(Long, Long)][error] Error occurred in an
application involving default arguments.[error]         val graph =
Graph.fromEdgeTuples(edgesRDD, 1)

I tried to declare the edgesRDD like the following code but this just
moves the error by doing this:
val edgesDF = spark.read.parquet("/path/to/edges/parquet/")val
edgesRDD : RDD[(Long,Long)] = edgesDF.rdd
val graph = Graph.fromEdgeTuples(edgesRDD, 1)
[error] 
/home/alex/ownCloud/JupyterNotebooks/Diss_scripte/Webgraph_analysis/pagerankscala/src/main/scala/pagerank.scala:17:44:
type mismatch;
[error]  found   : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
[error]  required: org.apache.spark.rdd.RDD[(Long, Long)]
[error]         val edgesRDD : RDD[(Long,Long)] = edgesDF.rdd

So I guess I have to transform
org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] into
 org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId,
org.apache.spark.graphx.VertexId)](which expands to)
org.apache.spark.rdd.RDD[(Long, Long)]

how can I achieve this ?

Reply via email to