Hey all, I want to load a parquet containing my edges into an Graph my code so far looks like this:
val edgesDF = spark.read.parquet("/path/to/edges/parquet/") val edgesRDD = edgesDF.rdd val graph = Graph.fromEdgeTuples(edgesRDD, 1) But simply this produces an error: [error] found : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row][error] required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.VertexId)][error] (which expands to) org.apache.spark.rdd.RDD[(Long, Long)][error] Error occurred in an application involving default arguments.[error] val graph = Graph.fromEdgeTuples(edgesRDD, 1) I tried to declare the edgesRDD like the following code but this just moves the error by doing this: val edgesDF = spark.read.parquet("/path/to/edges/parquet/")val edgesRDD : RDD[(Long,Long)] = edgesDF.rdd val graph = Graph.fromEdgeTuples(edgesRDD, 1) [error] /home/alex/ownCloud/JupyterNotebooks/Diss_scripte/Webgraph_analysis/pagerankscala/src/main/scala/pagerank.scala:17:44: type mismatch; [error] found : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] [error] required: org.apache.spark.rdd.RDD[(Long, Long)] [error] val edgesRDD : RDD[(Long,Long)] = edgesDF.rdd So I guess I have to transform org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] into org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.VertexId)](which expands to) org.apache.spark.rdd.RDD[(Long, Long)] how can I achieve this ?