[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

yhuai Tue, 05 Jul 2016 11:20:02 -0700

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/13818
  
    I have a few questions. 
    
    1. Is it a regression from 1.6? Looks like not?
    2. Is it a correctness issue or a performance issue? Seems it is a 
performance issue?
    3. If it is a performance issue. What is the impact? For a hive parquet/orc 
table, after we convert them to Spark's native code path, there is no 
partitioning discovery. So, I guess the performance is mainly coming from 
querying metastore? If so, what will be the perf difference after 
`spark.sql.hive.metastorePartitionPruning` (only querying needed partition info 
from Hive metastore) is enabled?
    
    My feeling is that if it is a perf issue and it is not a regression from 
1.6, merging to master should be good enough.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

Reply via email to