Partitioning under Spark 1.0.x

2014-08-19 Thread losmi83
Hi guys, I want to create two RDD[(K, V)] objects and then collocate partitions with the same K on one node. When the same partitioner for two RDDs is used, partitions with the same K end up being on different nodes. Here is a small example that illustrates this: // Let's say I have 10 nodes

Re: Where do my partitions go?

2014-08-08 Thread losmi83
Thanks for you answer. But the same problem appears if you start from one common RDD: val partitioner = new HashPartitioner(10) val dummyJob = sc.parallelize(0 until 10).map(x => (x,x)) dummyJob.partitionBy(partitioner).foreach { case (ind, x) => println("Dummy1 -> Id = " +

Re: Partitioning: Where do my partitions go?

2014-08-08 Thread losmi83
I'd appreciate if anyone could confirm whether this is a bug or intended behavior of Spark. Thanks, Milos -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partitioning-Where-do-my-partitions-go-tp11635p11766.html Sent from the Apache Spark User List mailing

Where do my partitions go?

2014-08-07 Thread losmi83
Hi guys, the latest Spark version 1.0.2 exhibits a very strange behavior when it comes to deciding on which node a given partition should reside. The following example was tested in the standalone Spark mode. val partitioner = new HashPartitioner(10) val dummyJob1 = sc.parallelize