On Thu, Jul 10, 2014 at 8:20 AM, Yifan LI <iamyifa...@gmail.com> wrote:
>
> - how to "build the latest version of Spark from the master branch, which
> contains a fix"?


Instead of downloading a prebuilt Spark release from
http://spark.apache.org/downloads.html, follow the instructions under
"Development Version" on that page. In short:

git clone git://github.com/apache/spark.git
cd spark
sbt/sbt assembly

Then you can run bin/spark-shell and bin/spark-submit as usual, and
Graph.partitionBy should work.

- how to specify other partition strategy, eg. CanonicalRandomVertexCut,
> EdgePartition1D, EdgePartition2D, RandomVertexCut
> (listed in Scala API document, but seems only "EdgePartition2D" is
> available? I am not sure for this! )


All of those partition strategies should be available -- for example, you
can call graph.partitionBy(PartitionStrategy.RandomVertexCut).

- Is it possible to add my own partition strategy(hash function, etc.) into
> GraphX?


Yes, you just need to create a subclass of PartitionStrategy as follows:

import org.apache.spark.graphx._

object MyPartitionStrategy extends PartitionStrategy {
  override def getPartition(src: VertexId, dst: VertexId, numParts:
PartitionID): PartitionID = {
    // put your hash function here
  }
}

Ankur <http://www.ankurdave.com/>

Reply via email to