Hi Cedric,
Yes it definitely is possible. There are roughly two popular ways: parallel 
merge sort and parallel bucket sort. Parallel merge sort sorts individual 
batches and then merges them according to some schedule. Parallel bucket sort 
samples the input data and does range partitioning into n buckets, where n is 
the number of threads. Then each thread sorts the individual bucket, and the 
buckets are concatenated. 

As to why arrow doesn’t do this, no one has gotten around to it yet 🙂

As for C++17 parallel sort, I’m afraid we can’t use that since we are still on 
C++11. Also it’s not clear how we’d reuse it if we wanted to sort on 
larger-than-memory datasets, which would involve writing to/from disk. 

Sasha Krassovsky 

> 24 мая 2022 г., в 19:17, Cedric Yau <[email protected]> написал(а):
> 
> 
> I've noticed in calling pyarrow.Table.sort_indices[1] and 
> pyarrow.compute.array_sort_indices[2], which Table.sort_indices is based on, 
> that CPU consumption maxes out a single core.  Are there any ways to scale 
> sorting beyond a single CPU?
> 
> It looks like there is a custom Radix Sort implemented[3] so this may not be 
> trivial.  Originally I thought there might be a way to use some of the new 
> C++17 parallel algorithm functionality ([4] and [5]). 
> 
> [1] pyarrow.compute.sort_indices — Apache Arrow v8.0.0
> [2] pyarrow.compute.array_sort_indices — Apache Arrow v8.0.0
> [3] arrow/vector_sort.cc at 7a0f00c16e084d194ae53d209b33b809cfc8f2d5 · 
> apache/arrow (github.com)
> [4] Parallel Algorithms of the STL with the GCC Compiler - ModernesCpp.com
> [5] Using C++17 Parallel Algorithms for Better Performance - C++ Team Blog 
> (microsoft.com)
> 
> Thanks,
> Cedric

Reply via email to