[GitHub] spark issue #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examp...

chetkhatri Mon, 25 Dec 2017 21:07:50 -0800

Github user chetkhatri commented on the issue:

    https://github.com/apache/spark/pull/20081
  
    @cloud-fan Thanks for PR
    4. spark.sql.parquet.writeLegacyFormat - if you don't use this 
configuration, hive external table won't be able to access parquet data.
    5. repartition and coalesce is most common use case in Industry to control 
N Number of files under directory while doing partitioning data.
    i.e  If Data volume is very huge, then every partitions would have many 
small-small files which may harm
        downstream query performance due to File I/O, Bandwidth I/O, Network 
I/O, Disk I/O.
    Else I am good this your approach.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examp...

Reply via email to