GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/17247

    [SPARK-19905][SQL] Bring back Dataset.inputFiles for Hive SerDe tables

    ## What changes were proposed in this pull request?
    
    `Dataset.inputFiles` works by matching `FileRelation`s in the query plan. 
In Spark 2.1, Hive SerDe tables are represented by `MetastoreRelation`, which 
inherits from `FileRelation`. However, in Spark 2.2, Hive SerDe tables are now 
represented by `CatalogRelation`, which doesn't inherit from `FileRelation` 
anymore, due to the unification of Hive SerDe tables and data source tables. 
This change breaks `Dataset.inputFiles` for Hive SerDe tables.
    
    This PR tries to fix this issue by explicitly matching `CatalogRelation`s 
that are Hive SerDe tables in `Dataset.inputFiles`. Note that we can't make 
`CatalogRelation` inherit from `FileRelation` since not all `CatalogRelation`s 
are file based (e.g., JDBC data source tables).
    
    ## How was this patch tested?
    
    New test case added in `HiveDDLSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark 
spark-19905-hive-table-input-files

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17247
    
----
commit 3e0abc48de8219e3daf839584ec874855ced1210
Author: Cheng Lian <[email protected]>
Date:   2017-03-10T19:42:17Z

    Bring back Dataset.inputFiles for Hive SerDe tables

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to