GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/15169

    [SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames

    ## What changes were proposed in this pull request?
    
    Consider you have a bucket as `s3a://some-bucket`
    and under it you have files:
    ```
    s3a://some-bucket/file1.parquet
    s3a://some-bucket/file2.parquet
    ```
    Getting the parent path of `s3a://some-bucket/file1.parquet` yields
    `s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the 
hash map.
    
    When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at 
the end) to get the list of files, and we're left with an empty list! 
    
    This PR fixes this by adding a `/` at the end of the `URI` iff the given 
`Path` doesn't have a parent, i.e. is the root. This is a no-op if the path 
already had a `/` at the end, and is handled through the Hadoop Path, path 
merging semantics. 
    
    
    ## How was this patch tested?
    
    Unit test in `FileCatalogSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark SPARK-17613

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15169
    
----
commit e168a6311cf85b00c187cc64eca605ea9e5a8a6d
Author: Burak Yavuz <brk...@gmail.com>
Date:   2016-09-20T21:36:16Z

    S3A base paths with no '/' at the end return empty DataFrames

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to