[GitHub] spark pull request #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scal...

cloud-fan Mon, 25 Dec 2017 20:43:39 -0800

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20081


    [SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examples

    ## What changes were proposed in this pull request?
    Some improvements:
    1. Point out we are using both Spark SQ native syntax and HQL syntax in the 
example
    2. Avoid using the same table name with temp view, to not confuse users.
    3. Create the external hive table with a directory that already has data, 
which is a more common use case.
    4. Remove the usage of `spark.sql.parquet.writeLegacyFormat`. This config 
was introduced by https://github.com/apache/spark/pull/8566 and has nothing to 
do with Hive.
    5. Remove `repartition` and `coalesce` example. These 2 are not Hive 
specific, we should put them in a different example file. BTW they can't 
accurately control the number of output files, 
`spark.sql.files.maxRecordsPerFile` also controls it.
    
    ## How was this patch tested?
    
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20081
    
----
commit 10a80b272e898043e250c2b24a792c9474cf0d10
Author: Wenchen Fan <wenchen@...>
Date:   2017-12-26T04:30:10Z

    clean up

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scal...

Reply via email to