Thomas Newton created HDFS-16894:
------------------------------------

             Summary: Expose `listStatus(Path path, String startFrom)` on 
`AzureBlobFileSystem`
                 Key: HDFS-16894
                 URL: https://issues.apache.org/jira/browse/HDFS-16894
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: fs/azure
    Affects Versions: 3.3.4, 3.3.2
            Reporter: Thomas Newton
             Fix For: 3.3.2


When working with Azure blob storage listing operations can often be quite slow 
even on storage accounts with the hierarchical namespace. 

This can be mitigated by listing only a specific subset of directories using a 
function like 
[https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-]

Which accepts a `startFrom` argument and lists all files in order starting from 
there.

I'm wondering if we could add a method to the `AzureBlobFileSystem`

Something like:

```
public FileStatus[] listStatus(final Path f, final String startFrom) throws 
IOException
```

This exposes the functionality that already exists on the underlying 
`AzureBlobFileSystemStore`. My understanding from reading a bit of the code is 
that users should mainly be dealing with `AzureBlobFileSystem`s and 
`AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing it 
on the `AzureBlobFileSystem`.

 

I'm very un-familiar with java but I'm told that keeping strictly to interfaces 
is strongly preferred. However I can see some examples already on 
`AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) 
so I'm hoping its acceptable to add a method like I described only for the one 
`FileSystem` implementation.

 

The specific motivation for this is to unblock 
[https://github.com/delta-io/delta/issues/1568]

I would be willing to contribute this if maintainers think the plan is 
reasonable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to