[GitHub] spark issue #22614: [SPARK-25561][SQL] Implement a new config to control par...

kmanamcheri Mon, 08 Oct 2018 14:31:25 -0700

Github user kmanamcheri commented on the issue:

    https://github.com/apache/spark/pull/22614
  
    @gatorsmile, @tejasapatil was reviewing the code before I added the new 
config option. I have asked him to review the new code. Lets see what his 
thoughts are on that. I have also asked him clarification on what he means by 
exponential backoff with retries.
    
    I want to take a step back and revisit 
[SPARK-17992](https://issues.apache.org/jira/browse/SPARK-17992) and in 
particular [one of the 
comments](https://github.com/apache/spark/pull/15673#issuecomment-257120666) 
from @ericl 
    
    > For large tables, the degraded performance should be considered a bug as 
well.
    >
    > How about this.
    >
    >If direct sql is disabled, log a warning about degraded performance with 
this flag and fall back to >fetching all partitions.
    >If direct sql is enabled, crash with a message suggesting to disable 
filesource partition management >and report a bug.
    >That way, we will know if there are cases where metastore pruning fails 
with direct sql enabled.
    
    It looks like a compromise was reached where we don't support fetching all 
the time (and only for a subset of cases). My suggested fix is a cleaner way of 
approaching it through a SQLConf instead of looking at the Hive config. 
    
    Thoughts @mallman @ericl



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22614: [SPARK-25561][SQL] Implement a new config to control par...

Reply via email to