Hi Chapman,
You can use "cluster by" to do what you want.
https://deepsense.io/optimize-spark-with-distribute-by-and-cluster-by/

2017-06-24 17:48 GMT+07:00 Saliya Ekanayake <esal...@gmail.com>:

> I haven't worked with datasets but would this help https://stackoverflow.
> com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd?
>
> On Jun 23, 2017 5:43 PM, "Keith Chapman" <keithgchap...@gmail.com> wrote:
>
>> Hi,
>>
>> I have code that does the following using RDDs,
>>
>> val outputPartitionCount = 300
>> val part = new MyOwnPartitioner(outputPartitionCount)
>> val finalRdd = myRdd.repartitionAndSortWithinPartitions(part)
>>
>> where myRdd is correctly formed as key, value pairs. I am looking convert
>> this to use Dataset/Dataframe instead of RDDs, so my question is:
>>
>> Is there custom partitioning of Dataset/Dataframe implemented in Spark?
>> Can I accomplish the partial sort using mapPartitions on the resulting
>> partitioned Dataset/Dataframe?
>>
>> Any thoughts?
>>
>> Regards,
>> Keith.
>>
>> http://keith-chapman.com
>>
>

Reply via email to