Thanks Ankur<http://www.ankurdave.com/>,
Built it from git and it works great. I have another issue now. I am trying to process a huge graph with about 20 billion edges with GraphX. I only load the file, compute connected components and persist it right back to disk. When working with subgraphs (with ~50M edges) this works well, but on the whole graph it seem to choke on the graph construction part. Can you advise on how to tune spark to process memory parameters for this task. Thanks, Alex From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Thursday, May 22, 2014 6:59 PM To: user@spark.apache.org Subject: Re: GraphX partition problem The fix will be included in Spark 1.0, but if you just want to apply the fix to 0.9.1, here's a hotfixed version of 0.9.1 that only includes PR #367: https://github.com/ankurdave/spark/tree/v0.9.1-handle-empty-partitions. You can clone and build this. Ankur<http://www.ankurdave.com/> On Thu, May 22, 2014 at 4:53 AM, Zhicharevich, Alex <azhicharev...@ebay.com<mailto:azhicharev...@ebay.com>> wrote: Hi, I’m running a simple connected components code using GraphX (version 0.9.1) My input comes from a HDFS text file partitioned to 400 parts. When I run the code on a single part or a small number of files (like 20) the code runs fine. As soon as I’m trying to read more files (more than 30) I’m getting an error and the job fails. From looking at the logs I see the following exception java.util.NoSuchElementException: End of stream at org.apache.spark.util.NextIterator.next(NextIterator.scala:83) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29) at org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52) at org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51) at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456) From searching the web, I see it’s a known issue with GraphX Here : https://github.com/apache/spark/pull/367 And here : https://github.com/apache/spark/pull/497 Are there some stable releases that include this fix? Should I clone the git repo and build it myself? How would you advise me to deal with this issue Thanks, Alex