GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/20081
[SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examples
## What changes were proposed in this pull request?
Some improvements:
1. Point out we are using both Spark SQ native syntax and HQL syntax in the
example
2. Avoid using the same table name with temp view, to not confuse users.
3. Create the external hive table with a directory that already has data,
which is a more common use case.
4. Remove the usage of `spark.sql.parquet.writeLegacyFormat`. This config
was introduced by https://github.com/apache/spark/pull/8566 and has nothing to
do with Hive.
5. Remove `repartition` and `coalesce` example. These 2 are not Hive
specific, we should put them in a different example file. BTW they can't
accurately control the number of output files,
`spark.sql.files.maxRecordsPerFile` also controls it.
## How was this patch tested?
N/A
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark minor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20081.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20081
----
commit 10a80b272e898043e250c2b24a792c9474cf0d10
Author: Wenchen Fan <wenchen@...>
Date: 2017-12-26T04:30:10Z
clean up
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]