Hello~ I was running some pagerank tests of GraphX in my 8 nodes cluster. I allocated each worker 32G memory and 8 CPU cores. The LiveJournal dataset used 370s, which in my mind is reasonable. But when I tried the com-Friendster data ( http://snap.stanford.edu/data/com-Friendster.html ) with 65608366 nodes and 1806067135 edges, it took more than 70 hours and is still running. I'm not sure what caused such a strange phenomenon, the graph's structure or some unrealized properties of GraphX? Thanks~
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-time-for-GraphX-pagerank-in-dataset-com-Friendster-tp4511.html Sent from the Apache Spark User List mailing list archive at Nabble.com.