Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)
I’m seeing this issue also. I have graph with with 5828339535 vertices and 7398447992 edges, graph.numVertices returns 1533266498 and graph.numEdges is correct and returns 7398447992. I also am having an issue that I’m beginning to suspect is caused by the same underlying problem where connected components stops after one iteration, returning an incorrect graph. On Aug 22, 2014, at 8:43 PM, npanj nitinp...@gmail.com wrote: While creating a graph with 6B nodes and 12B edges, I noticed that *'numVertices' api returns incorrect result*; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=*2157586441* ; noEdges=2747322705 Graph Returns: numVertices=*-2137380855* ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf (I have also filed this jira ticket: https://issues.apache.org/jira/browse/SPARK-3190) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Graphx-seems-to-be-broken-while-Creating-a-large-graph-6B-nodes-in-my-case-tp7966.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Spark Contribution
That sounds like a good idea. Continuing along those lines, what do people think of moving the contributing page entirely from the wiki to GitHub? It feels like the right place for it since GitHub is where we take contributions, and it also lets people make improvements to it. Nick 2014년 8월 23일 토요일, Sean Owenso...@cloudera.com님이 작성한 메시지: Can I ask a related question, since I have a PR open to touch up README.md as we speak (SPARK-3069)? If this text is in a file called CONTRIBUTING.md, then it will cause a link to appear on the pull request screen, inviting people to review the contribution guidelines: https://github.com/blog/1184-contributing-guidelines This is mildly important as the project wants to make it clear that you agree that your contribution is licensed under the AL2, since there is no formal ICLA. How about I propose moving the text to CONTRIBUTING.md with a pointer in README.md? or keep it both places? On Sat, Aug 23, 2014 at 1:08 AM, Reynold Xin r...@databricks.com javascript:; wrote: Great idea. Added the link https://github.com/apache/spark/blob/master/README.md On Thu, Aug 21, 2014 at 4:06 PM, Nicholas Chammas nicholas.cham...@gmail.com javascript:; wrote: We should add this link to the readme on GitHub btw. 2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com javascript:;님이 작성한 메시지: The Apache Spark wiki on how to contribute should be great place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark - Henry On Thu, Aug 21, 2014 at 3:25 AM, Maisnam Ns maisnam...@gmail.com javascript:; javascript:; wrote: Hi, Can someone help me with some links on how to contribute for Spark Regards mns - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:; javascript:;
Mesos/Spark Deadlock
I just wanted to bring up a significant Mesos/Spark issue that makes the combo difficult to use for teams larger than 4-5 people. It's covered in https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that Spark's use of executors in fine-grained mode is a very different behavior than many of the other common frameworks for Mesos.
Re: Mesos/Spark Deadlock
Hi Matei, We have an analytics team that uses the cluster on a daily basis. They use two types of 'run modes': 1) For running actual queries, they set the spark.executor.memory to something between 4 and 8GB of RAM/worker. 2) A shell that takes a minimal amount of memory on workers (128MB) for prototyping out a larger query. This allows them to not take up RAM on the cluster when they do not really need it. We see the deadlocks when there are a few shells in either case. From the usage patterns we have, coarse-grained mode would be a challenge as we have to constantly remind people to kill their shells as soon as their queries finish. Am I correct in viewing Mesos in coarse-grained mode as being similar to Spark Standalone's cpu allocation behavior? On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the duration of the job. Matei On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote: I just wanted to bring up a significant Mesos/Spark issue that makes the combo difficult to use for teams larger than 4-5 people. It's covered in https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that Spark's use of executors in fine-grained mode is a very different behavior than many of the other common frameworks for Mesos.