[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

CodingCat Sat, 02 Dec 2017 22:17:47 -0800

Github user CodingCat commented on the issue:

    https://github.com/apache/spark/pull/19864
  
    @viirya yes, we can get more accurate stats later, however, the first stats 
is also important as it enables the user to pay less for `the first run` which 
writes cache. 
    
    The current implementation always chooses the most expensive plan in the 
first run, e.g. always resort to sortmergejoin instead of broadcastjoin even it 
is possible, CBO is actually disabled for any operator which locates in 
downstream of InMemoryRelation. Additionally, it makes execution plan 
inconsistent even for the same query over the same dataset. Of course, all of 
these issues happen in the first run.
    
    IMHO, we have a chance to make it better, why not?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19864: [SPARK-22673][SQL] InMemoryRelation should utilize exist...

Reply via email to