Re: Spark Dataset API for secondary sorting

2019-12-24 Thread Akira Ajisaka
Hi Daniel, This is the user mailing list for Apache Hadoop, not Apache Spark. Please use instead. https://spark.apache.org/community.html -Akira On Tue, Dec 3, 2019 at 1:00 AM Daniel Zhang wrote: > Hi, Spark Users: > > I have a question related to the way I use the spark Dataset API for my >

Spark Dataset API for secondary sorting

2019-12-02 Thread Daniel Zhang
Hi, Spark Users: I have a question related to the way I use the spark Dataset API for my case. If the "ds_old" dataset is having 100 records, with 10 unique $"col1", and for the following pseudo-code: val ds_new = ds_old.repartition(5, $"col1").sortWithinPartitions($"col2").mapPartitions(new