I’ve read these pages. In the paper "GraphX: Graph Processing in a Distributed Dataflow Framework “, the authors claim that it only takes 400 seconds for uk-2007-05 dataset, which is similar size as my dateset. Is the current Graphx the same version as the Graphx in that paper? And how many partitions does the experiment have for uk-2007-05 dataset. I tried 16, 192partitions, and both are sucked.
原始邮件 发件人:Ted yuyuzhih...@gmail.com 收件人:txw...@outlook.com 抄送:useru...@spark.apache.org 发送时间:2015年1月16日(周五) 02:23 主题:Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges? Have you seenhttp://search-hadoop.com/m/JW1q5pE3P12 ? Please also take a look at the end-to-end performance graph on http://spark.apache.org/graphx/ Cheers On Thu, Jan 15, 2015 at 9:29 AM, txw t...@outlook.com wrote: Hi, I am run PageRank on a large dataset, which include 200 million nodes and 2 billion edges? Isspark suitable for large scale pagerank? How many cores and MEM do I need and how long will it take? Thanks Xuewei Tang