Re: bugs in Spark PageRank implementation
Hi Terence, which implementation are you using? I tested it and the results look very good id --- result value -percentage --- percentage (wikipedia) 2: 3.5658816369034536 (38.43986817970977 %), 38.4% 3: 3.1809909923039688 (34.29078328331496 %), 34.3% 5: 0.7503491964913347 (8.088693663686792 %), 8.1% 6: 0.36259893900587814 (3.9087824097247115 %), 3.9% 4: 0.36259893900587814 (3.9087824097247115 %), 3.9% 1: 0.30409919649133466 (3.2781606954384332 %), 3.3% 10: 0.15 (1.6169858716801204 %), 1.6% 8: 0.15 (1.6169858716801204 %), 1.6% 9: 0.15 (1.6169858716801204 %), 1.6% 7: 0.15 (1.6169858716801204 %), 1.6% 11: 0.15 (1.6169858716801204 %), 1.6% This is the coding I used: val edges = Seq((2, 3), (3, 2), (4, 1), (4, 2), (5, 2), (5, 4), (5, 6), (6, 2), (6, 5), (7, 2), (7, 5), (8, 2), (8, 5), (9, 2), (9, 5), (10, 5), (11, 5)).map((x) => new Edge[Int](x._1, x._2, 0)) val vertices = (1L to 11L).map(x => (x, x)) val graph = Graph[Long, Int](sc.parallelize(vertices), sc.parallelize(edges)) val res = graph.pageRank(0.1) val resV = res.vertices.collect() val sum = resV.map(_._2).sum println(resV.sortBy(_._2).reverse.map(x => s"${x._1}: ${x._2} (${x._2 / sum * 100} %)").reduce((a, b) => s"$a\n$b")) I used the latest master branch. I guess they only added the personalized page rank since 1.2. Hopefully this helps Tarek On Wed, Jun 24, 2015 at 9:39 PM Kelly, Terence P (HP Labs Researcher) < terence.p.ke...@hp.com> wrote: > Hi, > > Colleagues and I have found that the PageRank implementation bundled > with Spark is incorrect in several ways. The code in question is in > Apache Spark 1.2 distribution's "examples" directory, called > "SparkPageRank.scala". > > Consider the example graph presented in the colorful figure on the > Wikipedia page for "PageRank"; below is an edge list representation, > where vertex "A" is "1", "B" is "2", etc.: > > - - - - - begin > 2 3 > 3 2 > 4 1 > 4 2 > 5 2 > 5 4 > 5 6 > 6 2 > 6 5 > 7 2 > 7 5 > 8 2 > 8 5 > 9 2 > 9 5 > 10 5 > 11 5 > - - - - - end > > Here's the output we get from Spark's PageRank after 100 iterations: > > B has rank: 1.9184837009011475. > C has rank: 1.7807113697064196. > E has rank: 0.24301279014684984. > A has rank: 0.24301279014684984. > D has rank: 0.21885362387494078. > F has rank: 0.21885362387494078. > > There are three problems with the output: > > 1. Only six of the eleven vertices are represented in the output; >by definition, PageRank assigns a value to each vertex. > > 2. The values do not sum to 1.0; by definition, PageRank is a >probability vector with one element per vertex and the sum of >the elements of the vector must be 1.0. > > 3. Vertices E and A receive the same PageRank, whereas other means of >computing PageRank, e.g., our own homebrew code and the method >used by Wikipedia, assign different values to these vertices. Our >own code has been compared against the PageRank implementation in >the "NetworkX" package and it agrees. > > It looks like bug #1 is due to the Spark implementation of PageRank > not emitting output for vertices with no incoming edges and bug #3 is > due to the code not correctly handling vertices with no outgoing > edges. Once #1 and #3 are fixed, normalization might be all that's > required to fix #2 (maybe). > > We currently rely on the Spark PageRank for tests we're > conducting; when do you think a fix might be ready? > > Thanks. > > -- Terence Kelly, HP Labs > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Running spark1.4 inside intellij idea HttpServletResponse - ClassNotFoundException
Hey, I had some similar issues in the past when I used Java 8. Are you using Java 7 or 8. (it's just an idea, because I had a similar issue) On Mon 15 Jun 2015 at 6:52 am Wwh 吴 wrote: > name := "SparkLeaning" > > version := "1.0" > > scalaVersion := "2.10.4" > //scalaVersion := "2.11.2" > > libraryDependencies ++= Seq( > //"org.apache.hive"% "hive-jdbc" % "0.13.0" > //"io.spray" % "spray-can" % "1.3.1", > //"io.spray" % "spray-routing" % "1.3.1", > "io.spray" % "spray-testkit" % "1.3.1" % "test", > "io.spray" %% "spray-json" % "1.2.6", > "com.typesafe.akka" %% "akka-actor" % "2.3.2", > "com.typesafe.akka" %% "akka-testkit" % "2.3.2" % "test", > "org.scalatest" %% "scalatest" % "2.2.0", > "org.apache.spark" %% "spark-core" % "1.4.0", > "org.apache.spark" %% "spark-sql" % "1.4.0", > "org.apache.spark" %% "spark-hive" % "1.4.0", > "org.apache.spark" %% "spark-mllib" % "1.4.0", > //"org.apache.hadoop" %% "hadoop-client" % "2.4.0" > "javax.servlet" % "javax.servlet-api" % "3.0.1"//, > //"org.eclipse.jetty"%%"jetty-servlet"%"8.1.14.v20131031", > //"org.eclipse.jetty.orbit"%"javax.servlet"%"3.0.0.v201112011016" > //"org.mortbay.jetty"%%"servlet-api"%"3.0.20100224" > > ) > > object SparkPI { > def main(args:Array[String]): Unit = { > val conf = new SparkConf().setAppName("Spark Pi") > conf.setMaster("local") > > val spark = new SparkContext(conf) > val slices = if (args.length > 0)args(0).toInt else 2 > val n = 10 * slices > val count = spark.parallelize(1 to n, slices).map{ i => > val x = random * 2 -1 > val y = random * 2 -1 > if (x*x + y*y < 1) 1 else 0 > }.reduce(_ + _) > println("Pi is roughly" + 4.0 * count / n) > spark.stop() > } > } > > when Running this program,something is error! help me? > > 15/06/15 21:40:08 INFO HttpServer: Starting HTTP Server > Exception in thread "main" java.lang.NoClassDefFoundError: > javax/servlet/http/HttpServletResponse > at > org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) > at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) > at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) > at > org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) > at org.apache.spark.HttpServer.start(HttpServer.scala:62) > at org.apache.spark.HttpFileServer.initialize(HttpFileServer.scala:46) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:350) > at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) > at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) > at org.apache.spark.SparkContext.(SparkContext.scala:424) > at org.learn.SparkPI$.main(SparkPI.scala:24) > at org.learn.SparkPI.main(SparkPI.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) > Caused by: java.lang.ClassNotFoundException: > javax.servlet.http.HttpServletResponse > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 19 more > 15/06/15 21:40:08 INFO DiskBlockManager: Shutdown hook called > > >