Hi, I am trying graphx on live journal data. I have a cluster of 17 computing nodes, 1 master and 16 workers. I had few questions about this. * I built spark from spark-master (to avoid partitionBy error of spark 1.0). * I am using edgeFileList() to load data and I figured I need to provide partitions I want. the exact syntax I am using is following val graph = GraphLoader.edgeListFile(sc, "filepath",true,64).partitionBy(PartitionStrategy.RandomVertexCut)
-- Is it a correct way to load file to get best performance? -- What should be the partition size? =computing node or =cores? -- I see following error so many times in my logs, ERROR BlockManagerWorker: Exception handling buffer message java.io.NotSerializableException: org.apache.spark.graphx.impl.ShippableVertexPartition Does it suggest that my graph wasn't partitioned properly? I suspect it affects performance ? Please suggest whether I'm following every step (correctly) Thanks in advance, -Shreyansh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-optimal-partitions-for-a-graph-and-error-in-logs-tp9455.html Sent from the Apache Spark User List mailing list archive at Nabble.com.