Hi guys,
I want to create two RDD[(K, V)] objects and then collocate partitions with
the same K on one node.
When the same partitioner for two RDDs is used, partitions with the same K
end up being on different nodes.
Here is a small example that illustrates this:
// Let's say I have 10 nodes
Thanks for you answer.
But the same problem appears if you start from one common RDD:
val partitioner = new HashPartitioner(10)
val dummyJob = sc.parallelize(0 until 10).map(x => (x,x))
dummyJob.partitionBy(partitioner).foreach { case (ind, x) =>
println("Dummy1 -> Id = " +
I'd appreciate if anyone could confirm whether this is a bug or intended
behavior of Spark.
Thanks,
Milos
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Partitioning-Where-do-my-partitions-go-tp11635p11766.html
Sent from the Apache Spark User List mailing
Hi guys,
the latest Spark version 1.0.2 exhibits a very strange behavior when it
comes to deciding on which node a given partition should reside. The
following example was tested in the standalone Spark mode.
val partitioner = new HashPartitioner(10)
val dummyJob1 = sc.parallelize