[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7016:
-------------------------------------
    Description: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} 
to determine its inputs. When the glob returns directories, each is traversed 
and {{LocatedFileStatus}} instances are returned with the block locations. 
However, when the glob returns files, this is a {{FileStatus}} that requires a 
second RPC to obtain its locations.  (was: {{FileInputFormat::getSplits}} uses 
{{FileSystem::globStatus}} to determine its inputs. When the glob returns 
directories, each is traversed and {{LocatedFileStatus}} instances are returned 
with the block locations. However, when the glob returns files, each requires a 
second RPC to obtain its locations.)

> Avoid making separate RPC calls for FileStatus and block locations in 
> FileInputFormat
> -------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7016
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Chris Douglas
>
> {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine 
> its inputs. When the glob returns directories, each is traversed and 
> {{LocatedFileStatus}} instances are returned with the block locations. 
> However, when the glob returns files, this is a {{FileStatus}} that requires 
> a second RPC to obtain its locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to