[ https://issues.apache.org/jira/browse/HDFS-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HDFS-14663 stopped by Siyao Meng. ----------------------------------------- > HttpFS: LISTSTATUS_BATCH does not return batches > ------------------------------------------------ > > Key: HDFS-14663 > URL: https://issues.apache.org/jira/browse/HDFS-14663 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs > Affects Versions: 3.3.0 > Reporter: Stephen O'Donnell > Assignee: Siyao Meng > Priority: Major > > The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can > retrieve the file listing for a large directory in chunks. > When using the webhdfs service embedded in the namenode, this works as > expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns > the entire listing rather than batches, working effectively like LISTSTATUS > instead. > This seems to be because HTTPFS falls back to using the method > org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be > overridden, but the implementation used in HTTPFS has not done that, leading > to this limitation. > This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based > on my testing it does not work as intended. I suspect it is because the > listStatusBatch operation was added to the WebHdfsFileSystem and > HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS > seems to use DistributeFileSystem and hence it falls back to the default > implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which > returns all entries in a single batch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org