[ https://issues.apache.org/jira/browse/TEZ-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated TEZ-4245: ---------------------------------- Attachment: TEZ-4245.1.patch > Optimise split grouping when locality information is set to null/empty > ---------------------------------------------------------------------- > > Key: TEZ-4245 > URL: https://issues.apache.org/jira/browse/TEZ-4245 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Priority: Major > Attachments: TEZ-4245.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In objectstores like S3, locality information always shows up as "localhost". > Having this information in inputsplit slows down scheduling as explained in > https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove > "localhost" information from splits. > > Split information without any locality information (localhost/null/empty) > should be treated equally, so that split grouping can do meaningful grouping > based on cluster size. This is to avoid creating small split groups, which > can significantly increase runtime due to sequential processing (i.e same map > task getting lots of inputs and system ends up spending time in > open/seek/close on objectstores). > -- This message was sent by Atlassian Jira (v8.3.4#803005)