[ https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198492#comment-17198492 ]
Steve Loughran commented on YARN-10444: --------------------------------------- The openFile() API is in hadoop 3.3.0, but the standard seek option key/values are only now going in. These changes will be part of the main patch....the YARN JIRA is there for completeness/awareness > use openFile() with sequential IO for localizing files. > ------------------------------------------------------- > > Key: YARN-10444 > URL: https://issues.apache.org/jira/browse/YARN-10444 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.3.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Minor > > HADOOP-16202 adds standard options for declaring the read/seek > Policy when reading a file. These should be set to sequential IO > When localising resources, so that if the default/cluster settings > For a file system are optimized for random IO, artifact downloads > are still read at the maximum speed possible (one big GET to the EOF). > Most of this happens in hadoop-common, but some tuning of FSDownload > can assist > * tar/jar download must also be sequential > * if the FileStatus is passed around, that can be used > in the open request to skip checks when loading the file. > > Together this can save 3 HEAD requests per resource, with the sequential > IO avoiding any splitting of the big read into separate block GETs -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org