I've noticed in calling pyarrow.Table.sort_indices[1] and pyarrow.compute.array_sort_indices[2], which Table.sort_indices is based on, that CPU consumption maxes out a single core. Are there any ways to scale sorting beyond a single CPU?
It looks like there is a custom Radix Sort implemented[3] so this may not be trivial. Originally I thought there might be a way to use some of the new C++17 parallel algorithm functionality ([4] and [5]). [1] pyarrow.compute.sort_indices — Apache Arrow v8.0.0 <https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html#:~:text=Return%20the%20indices%20that%20would,the%20end%20of%20the%20input.> <https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html#:~:text=Return%20the%20indices%20that%20would,the%20end%20of%20the%20input.> [2] pyarrow.compute.array_sort_indices — Apache Arrow v8.0.0 <https://arrow.apache.org/docs/python/generated/pyarrow.compute.array_sort_indices.html> [3] arrow/vector_sort.cc at 7a0f00c16e084d194ae53d209b33b809cfc8f2d5 · apache/arrow (github.com) <https://github.com/apache/arrow/blob/7a0f00c16e084d194ae53d209b33b809cfc8f2d5/cpp/src/arrow/compute/kernels/vector_sort.cc#L456> <[email protected]>[4] Parallel Algorithms of the STL with the GCC Compiler - ModernesCpp.com <https://www.modernescpp.com/index.php/parallel-algorithms-of-the-stl-with-gcc> [5] Using C++17 Parallel Algorithms for Better Performance - C++ Team Blog (microsoft.com) <https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/> Thanks, Cedric
