Hi,
A custom partitioner is indeed the solution.
Here is a sample code:
import org.apache.spark.Partitioner
class KeyPartitioner(keyList: Seq[Any]) extends Partitioner {
def numPartitions: Int = keyList.size + 1
def getPartition(key: Any): Int = keyList.indexOf(key) + 1
override def
Just provide own partitioner.
One I wrote a partitioner which keeps similar keys together in one
partitioner.
Best regards,
Denis
On 12 September 2016 at 19:44, sujeet jog wrote:
> Hi,
>
> Is there a way to partition set of data with n keys into exactly n
> partitions.
Hi,
Is there a way to partition set of data with n keys into exactly n
partitions.
For ex : -
tuple of 1008 rows with key as x
tuple of 1008 rows with key as y and so on total 10 keys ( x, y etc )
Total records = 10080
NumOfKeys = 10
i want to partition the 10080 elements into exactly 10