Re: bugs in Spark PageRank implementation

2015-06-24 Thread Tarek Auel
Hi Terence,

which implementation are you using? I tested it and the results look very
good

id --- result value -percentage --- percentage
(wikipedia)
2: 3.5658816369034536 (38.43986817970977 %), 38.4%
3: 3.1809909923039688 (34.29078328331496 %), 34.3%
5: 0.7503491964913347 (8.088693663686792 %), 8.1%
6: 0.36259893900587814 (3.9087824097247115 %), 3.9%
4: 0.36259893900587814 (3.9087824097247115 %), 3.9%
1: 0.30409919649133466 (3.2781606954384332 %), 3.3%
10: 0.15 (1.6169858716801204 %), 1.6%
8: 0.15 (1.6169858716801204 %), 1.6%
9: 0.15 (1.6169858716801204 %), 1.6%
7: 0.15 (1.6169858716801204 %), 1.6%
11: 0.15 (1.6169858716801204 %), 1.6%

This is the coding I used:

val edges = Seq((2, 3), (3, 2), (4, 1), (4, 2), (5, 2), (5, 4), (5,
6), (6, 2), (6, 5), (7, 2), (7, 5), (8, 2), (8, 5), (9, 2), (9, 5),
(10, 5), (11, 5)).map((x) => new Edge[Int](x._1, x._2, 0))
val vertices = (1L to 11L).map(x => (x, x))
val graph = Graph[Long, Int](sc.parallelize(vertices), sc.parallelize(edges))
val res = graph.pageRank(0.1)
val resV = res.vertices.collect()
val sum = resV.map(_._2).sum
println(resV.sortBy(_._2).reverse.map(x => s"${x._1}: ${x._2} (${x._2
/ sum * 100} %)").reduce((a, b) => s"$a\n$b"))

I used the latest master branch. I guess they only added the
personalized page rank since 1.2.

Hopefully this helps
Tarek

On Wed, Jun 24, 2015 at 9:39 PM Kelly, Terence P (HP Labs Researcher) <
terence.p.ke...@hp.com> wrote:

> Hi,
>
> Colleagues and I have found that the PageRank implementation bundled
> with Spark is incorrect in several ways.  The code in question is in
> Apache Spark 1.2 distribution's "examples" directory, called
> "SparkPageRank.scala".
>
> Consider the example graph presented in the colorful figure on the
> Wikipedia page for "PageRank"; below is an edge list representation,
> where vertex "A" is "1", "B" is "2", etc.:
>
> - - - - - begin
> 2 3
> 3 2
> 4 1
> 4 2
> 5 2
> 5 4
> 5 6
> 6 2
> 6 5
> 7 2
> 7 5
> 8 2
> 8 5
> 9 2
> 9 5
> 10 5
> 11 5
> - - - - - end
>
> Here's the output we get from Spark's PageRank after 100 iterations:
>
> B has rank: 1.9184837009011475.
> C has rank: 1.7807113697064196.
> E has rank: 0.24301279014684984.
> A has rank: 0.24301279014684984.
> D has rank: 0.21885362387494078.
> F has rank: 0.21885362387494078.
>
> There are three problems with the output:
>
> 1. Only six of the eleven vertices are represented in the output;
>by definition, PageRank assigns a value to each vertex.
>
> 2. The values do not sum to 1.0; by definition, PageRank is a
>probability vector with one element per vertex and the sum of
>the elements of the vector must be 1.0.
>
> 3. Vertices E and A receive the same PageRank, whereas other means of
>computing PageRank, e.g., our own homebrew code and the method
>used by Wikipedia, assign different values to these vertices.  Our
>own code has been compared against the PageRank implementation in
>the "NetworkX" package and it agrees.
>
> It looks like bug #1 is due to the Spark implementation of PageRank
> not emitting output for vertices with no incoming edges and bug #3 is
> due to the code not correctly handling vertices with no outgoing
> edges.  Once #1 and #3 are fixed, normalization might be all that's
> required to fix #2 (maybe).
>
> We currently rely on the Spark PageRank for tests we're
> conducting; when do you think a fix might be ready?
>
> Thanks.
>
> -- Terence Kelly, HP Labs
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Running spark1.4 inside intellij idea HttpServletResponse - ClassNotFoundException

2015-06-15 Thread Tarek Auel
Hey,

I had some similar issues in the past when I used Java 8. Are you using
Java 7 or 8. (it's just an idea, because I had a similar issue)
On Mon 15 Jun 2015 at 6:52 am Wwh 吴  wrote:

> name := "SparkLeaning"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
> //scalaVersion := "2.11.2"
>
> libraryDependencies ++= Seq(
>   //"org.apache.hive"% "hive-jdbc" % "0.13.0"
>   //"io.spray" % "spray-can" % "1.3.1",
>   //"io.spray" % "spray-routing" % "1.3.1",
>   "io.spray" % "spray-testkit" % "1.3.1" % "test",
>   "io.spray" %% "spray-json" % "1.2.6",
>   "com.typesafe.akka" %% "akka-actor" % "2.3.2",
>   "com.typesafe.akka" %% "akka-testkit" % "2.3.2" % "test",
>   "org.scalatest" %% "scalatest" % "2.2.0",
>   "org.apache.spark" %% "spark-core" % "1.4.0",
>   "org.apache.spark" %% "spark-sql" % "1.4.0",
>   "org.apache.spark" %% "spark-hive" % "1.4.0",
>   "org.apache.spark" %% "spark-mllib" % "1.4.0",
>   //"org.apache.hadoop" %% "hadoop-client" % "2.4.0"
>   "javax.servlet" % "javax.servlet-api" % "3.0.1"//,
>   //"org.eclipse.jetty"%%"jetty-servlet"%"8.1.14.v20131031",
>   //"org.eclipse.jetty.orbit"%"javax.servlet"%"3.0.0.v201112011016"
>   //"org.mortbay.jetty"%%"servlet-api"%"3.0.20100224"
>
> )
>
> object SparkPI {
>   def main(args:Array[String]): Unit = {
> val conf = new SparkConf().setAppName("Spark Pi")
> conf.setMaster("local")
>
> val spark = new SparkContext(conf)
> val slices = if (args.length > 0)args(0).toInt else 2
> val n = 10 * slices
> val count = spark.parallelize(1 to n, slices).map{ i =>
>   val x = random * 2 -1
>   val y = random * 2 -1
>   if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
> println("Pi is roughly" + 4.0 * count / n)
> spark.stop()
>   }
> }
>
> when Running this program,something is error! help me?
>
> 15/06/15 21:40:08 INFO HttpServer: Starting HTTP Server
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> javax/servlet/http/HttpServletResponse
>   at 
> org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75)
>   at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
>   at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
>   at org.apache.spark.HttpServer.start(HttpServer.scala:62)
>   at org.apache.spark.HttpFileServer.initialize(HttpFileServer.scala:46)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:350)
>   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
>   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
>   at org.apache.spark.SparkContext.(SparkContext.scala:424)
>   at org.learn.SparkPI$.main(SparkPI.scala:24)
>   at org.learn.SparkPI.main(SparkPI.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: java.lang.ClassNotFoundException: 
> javax.servlet.http.HttpServletResponse
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 19 more
> 15/06/15 21:40:08 INFO DiskBlockManager: Shutdown hook called
>
>
>