Hi there,
I got an error when running one simple graphX program.
My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual
machines.
if I constructed one small graph (6 nodes, 4 edges), I run:
println("triangleCount: %s ".format(
hdfs_graph.triangleCount().vertices.count() ))
that returns me the correct results.
But I import a much larger graph (with 850000 nodes, 5000000 edges), the error
is
15/07/20 12:03:36 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 11.0
(TID 32, 192.168.157.131): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at
org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:90)
at
org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:87)
at
org.apache.spark.graphx.impl.VertexPartitionBaseOps.leftJoin(VertexPartitionBaseOps.scala:140)
at
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:159)
at
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156)
at
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
I run the above two graphs using the same submit command:
spark-submit --class "sparkUI.GraphApp" --master spark://master:7077
--executor-memory 2G --total-executor-cores 4 myjar.jar
any thought? anything wrong with my machine or configuration?
Best regards,
Jack