[ 
https://issues.apache.org/jira/browse/HADOOP-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth moved HDFS-16894 to HADOOP-18599:
-----------------------------------------------

          Component/s: fs/azure
                           (was: fs/azure)
        Fix Version/s:     (was: 3.3.2)
                  Key: HADOOP-18599  (was: HDFS-16894)
    Affects Version/s: 3.3.4
                       3.3.2
                           (was: 3.3.2)
                           (was: 3.3.4)
              Project: Hadoop Common  (was: Hadoop HDFS)

> Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-18599
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18599
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 3.3.4, 3.3.2
>            Reporter: Thomas Newton
>            Priority: Minor
>
> When working with Azure blob storage listing operations can often be quite 
> slow even on storage accounts with the hierarchical namespace. 
> This can be mitigated by listing only a specific subset of directories using 
> a function like 
> [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-]
> Which accepts a `startFrom` argument and lists all files in order starting 
> from there.
> I'm wondering if we could add a method to the `AzureBlobFileSystem`
> Something like:
> ```
> public FileStatus[] listStatus(final Path f, final String startFrom) throws 
> IOException
> ```
> This exposes the functionality that already exists on the underlying 
> `AzureBlobFileSystemStore`. My understanding from reading a bit of the code 
> is that users should mainly be dealing with `AzureBlobFileSystem`s and 
> `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing 
> it on the `AzureBlobFileSystem`.
>  
> I'm very un-familiar with java but I'm told that keeping strictly to 
> interfaces is strongly preferred. However I can see some examples already on 
> `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) 
> so I'm hoping its acceptable to add a method like I described only for the 
> one `FileSystem` implementation.
>  
> The specific motivation for this is to unblock 
> [https://github.com/delta-io/delta/issues/1568]
> I would be willing to contribute this if maintainers think the plan is 
> reasonable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to