[ https://issues.apache.org/jira/browse/MAPREDUCE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated MAPREDUCE-7016: ------------------------------------- Description: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its inputs. When the glob returns directories, each is traversed and {{LocatedFileStatus}} instances are returned with the block locations. However, when the glob returns files, this is a {{FileStatus}} that requires a second RPC to obtain its locations. (was: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its inputs. When the glob returns directories, each is traversed and {{LocatedFileStatus}} instances are returned with the block locations. However, when the glob returns files, each requires a second RPC to obtain its locations.) > Avoid making separate RPC calls for FileStatus and block locations in > FileInputFormat > ------------------------------------------------------------------------------------- > > Key: MAPREDUCE-7016 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Chris Douglas > > {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine > its inputs. When the glob returns directories, each is traversed and > {{LocatedFileStatus}} instances are returned with the block locations. > However, when the glob returns files, this is a {{FileStatus}} that requires > a second RPC to obtain its locations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org