GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/6090

    [SPARK-7567] [SQL] Migrating Parquet data source to FSBasedRelation

    This PR migrates Parquet data source to the newly introduced 
`FSBasedRelation`. `FSBasedParquetRelation` is created to replace 
`ParquetRelation2`. Major differences are:
    
    1. Partition discovery code has been factored out to `FSBasedRelation`
    1. `AppendingParquetOutputFormat` is not used now. Instead, an anonymous 
subclass of `ParquetOutputFormat` is used to handle appending and writing 
dynamic partitions
    1. When scanning partitioned tables, `FSBasedParquetRelation.buildScan` 
only builds an `RDD[Row]` for a single selected partition
    1. `FSBasedParquetRelation` doesn't rely on Catalyst expressions for filter 
push down, thus it doesn't extend `CatalystScan` anymore
    
       After migrating `JSONRelation` (which extends `CatalystScan`), we can 
remove `CatalystScan`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark parquet-migration

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6090
    
----
commit ec9950c591e5b981ce20fab96562db28488e0035
Author: Cheng Lian <[email protected]>
Date:   2015-05-10T19:21:51Z

    Migrates Parquet data source to FSBasedRelation

commit f4482cad46f90a47c96e7fbc91710470cf709835
Author: Cheng Lian <[email protected]>
Date:   2015-05-11T12:41:09Z

    Minor bug fix and more tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to