[ https://issues.apache.org/jira/browse/HADOOP-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth moved HDFS-16894 to HADOOP-18599: ----------------------------------------------- Component/s: fs/azure (was: fs/azure) Fix Version/s: (was: 3.3.2) Key: HADOOP-18599 (was: HDFS-16894) Affects Version/s: 3.3.4 3.3.2 (was: 3.3.2) (was: 3.3.4) Project: Hadoop Common (was: Hadoop HDFS) > Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem` > ------------------------------------------------------------------------- > > Key: HADOOP-18599 > URL: https://issues.apache.org/jira/browse/HADOOP-18599 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure > Affects Versions: 3.3.4, 3.3.2 > Reporter: Thomas Newton > Priority: Minor > > When working with Azure blob storage listing operations can often be quite > slow even on storage accounts with the hierarchical namespace. > This can be mitigated by listing only a specific subset of directories using > a function like > [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-] > Which accepts a `startFrom` argument and lists all files in order starting > from there. > I'm wondering if we could add a method to the `AzureBlobFileSystem` > Something like: > ``` > public FileStatus[] listStatus(final Path f, final String startFrom) throws > IOException > ``` > This exposes the functionality that already exists on the underlying > `AzureBlobFileSystemStore`. My understanding from reading a bit of the code > is that users should mainly be dealing with `AzureBlobFileSystem`s and > `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing > it on the `AzureBlobFileSystem`. > > I'm very un-familiar with java but I'm told that keeping strictly to > interfaces is strongly preferred. However I can see some examples already on > `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) > so I'm hoping its acceptable to add a method like I described only for the > one `FileSystem` implementation. > > The specific motivation for this is to unblock > [https://github.com/delta-io/delta/issues/1568] > I would be willing to contribute this if maintainers think the plan is > reasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org