I wasn't able to reproduce this with a small test file, but I did change
the file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to
take the third column rather than the second?

If so, would you mind posting a larger sample of the file, or even the
whole file if possible?

Here's the test that succeeded:

  test("graph.edges.distinct.count") {
    withSpark { sc =>
      val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
        "394365859\t136153151", "589404147\t1361045425"))
      val edgeTupRDD = edgeFullStrRDD.map(x => x.split("\t"))
        .map(x => (x(0).toLong, x(1).toLong))
      val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
        uniqueEdges = Option(CanonicalRandomVertexCut))
      assert(edgeTupRDD.distinct.count() === 2)
      assert(g.numEdges === 2)
      assert(g.edges.distinct.count() === 2)
    }
  }

Ankur <http://www.ankurdave.com/>

Reply via email to