[ https://issues.apache.org/jira/browse/TEZ-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166857#comment-17166857 ]
Rajesh Balamohan edited comment on TEZ-4208 at 7/29/20, 3:59 AM: ----------------------------------------------------------------- Q67 runtime with/without patch in internal cluster @ 10 TB scale: || ||Without Patch||With Patch|| |Job Runtime (in seconds)|1961.63 s|1656.14 s| |TaskCounter_Map_1_OUTPUT_Reducer_2:|x|x | |OUTPUT_BYTES_PHYSICAL: |457771151796|311823523913| |OUTPUT_RECORDS:|20169930972|20169930972| |SHUFFLE_CHUNK_COUNT:|37776|5193| was (Author: rajesh.balamohan): Q67 runtime with/without patch in internal cluster @ 10 TB scale: || ||Without Patch||With Patch|| |Job Runtime (in seconds)|1961.63 s|1656.14 s| |TaskCounter_Map_1_OUTPUT_Reducer_2:| | | |OUTPUT_BYTES_PHYSICAL: |457771151796|311823523913| |OUTPUT_RECORDS:|20169930972|20169930972| |SHUFFLE_CHUNK_COUNT:|37776|5193| > Pipelinesorter uses single SortSpan after spill > ----------------------------------------------- > > Key: TEZ-4208 > URL: https://issues.apache.org/jira/browse/TEZ-4208 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Priority: Major > Attachments: TEZ-4208.1.patch, q67_sorter.log > > > Though it could have created multiple spans, tez always uses the first span > after spill. It is quite possible that other spans are bigger compared to the > first one, due to progressive space allocation. Fixing this would help in > reducing the number of spills (depending on the jobs) and lesser load for > indexcache entries (as lesser number of files have to be opened). -- This message was sent by Atlassian Jira (v8.3.4#803005)