Flink only supports sorting within partitions. Thus, if you want to write
out a globally sorted dataset you should set the parallelism to 1 which
effectively results in a single partition. Decreasing the parallelism of an
operator will cause the individual partitions to lose its sort order
because the individual partitions are read in a non deterministic order.
On Thu, Feb 8, 2018 at 8:07 PM, david westwood <david.d.westw...@gmail.com>
> I would like to sort historical data using the dataset api.
> val dataset = [(Long, String)] ..
> .sortPartition(_._1, Order.ASCEDING)
> the data is out of order (in local order)
> prints the data in to correct order. I have run a small toy sample
> multiple times.
> Is there a way to sort the entire dataset with parallelism > 1 and write
> it to a single file in ascending order?