Hi, I am dealing with a graph consisting of 20 million nodes and 2 billion edges. When I want to persist the graph then an exception throw : Caused by: java.lang.UnsupportedOperationException: Cannot change storage level of an RDD after it was already assigned a leve Here is my code: def main(args: Array[String]) { if (args.length == 0) { System.err.println("Usage: Graph_on_Spark [master] <slices>") System.exit(1) } val sc = new SparkContext(args(0), "Graph_on_Spark", System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR"))) val hdfspath = "" var userRDD = sc.textFile(…) var edgeRDD:RDD[Edge[String]] = sc.textFile(…) for( no <- 1 to 4){ val vertexfile = sc.textFile(…) userRDD = userRDD.union( vertexfile.map{… } ) val edgefile = sc.textFile(…) edgeRDD = edgeRDD.union( …) } val graph = Graph(userRDD,edgeRDD,"Empty") println(graph.vertices.count) println(graph.edges.count) println("graph form success") val initialgraph = graph.persist(storage.StorageLevel.DISK_ONLY)
I don’t have no operation such as cache or persist before. Another question, when execute code below, would get a exception: Exception failure: java.lang.ArrayIndexOutOfBoundsException while(i < maxIter){ println("Iteration") println(g.vertices.count) val newVerts = g.vertices.innerJoin(messages)(pregel_vprog) g = g.outerJoinVertices(newVerts) { (vid,old,newOpt) => newOpt.getOrElse((old._1,"")) } println(g.vertices.count) messages = g.mapReduceTriplets[String](pregel_sendMsg,pregel_mergeFunc,Some((newVerts,activeDir))) println(g.vertices.count) i += 1 } Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graph-persist-error-tp3179.html Sent from the Apache Spark User List mailing list archive at Nabble.com.