DefaultSorter is the same sorter implementation used in MapReduce world and
is single threaded.  PipelinedSorter on the other hand works based on
divide/conquer approach and works on multiple sort-spans which can be
sorted by different threads. More details can be found in
http://people.apache.org/~gopalv/PipelinedSorter.pdf.

It is not possible to increase sort.mb to greater than 2 GB with
defaultsorter implementation. With pipelinedsorter, it is possible to
allocate more than 2 GB as sort buffer. This could be useful in scenarios
where you have large containers and can allocate more than 2 GB for sort
buffer to avoid potential disk spills. It is possible to control the number
of threads allocated for sorting in PipelinedSorter using
"tez.runtime.pipelined.sorter.sort.threads" (defaults to 2). Setting this
to lot higher value might not be useful as it depends on the number of
processors available in the system and the number of containers running on
the system.  Depending on workloads, 2-4 could be a sweetspot. Starting Tez
0.7, PipelinedSorter has been made the defacto-sorter, though users can
switch back to DefaultSorter (mapreduce world implementation) by setting
"tez.runtime.sorter.class=LEGACY"

~Rajesh.B

On Wed, Jun 3, 2015 at 7:18 AM, [email protected] <[email protected]>
wrote:

> In OrderedPartitionedKVOutput ,I see
> if (this.conf.getInt(TezRuntimeConfiguration.TEZ_RUNTIME_SORT_THREADS,
>     TezRuntimeConfiguration.TEZ_RUNTIME_SORT_THREADS_DEFAULT) > 1) {
>    sorter = new PipelinedSorter(getContext(), conf,
> getNumPhysicalOutputs(),
>    memoryUpdateCallbackHandler.getMemoryAssigned());
> } else {
>     sorter = new DefaultSorter(getContext(), conf,
> getNumPhysicalOutputs(),
>    memoryUpdateCallbackHandler.getMemoryAssigned());
> }
>
> When set  tez.runtime.sort.threads >1  will choose PipelinedSorter .
> ------------------------------
> [email protected]
>



-- 
~Rajesh.B

Reply via email to