[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

zecevicp Mon, 13 Aug 2018 00:00:40 -0700

Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    Sorry, I don't quite understand your question. This already applies (only) 
to equijoins if there are additional range conditions on secondary columns. So 
if Spark rewrites those range conditions (?), and you end up with two equijoins 
(doesn't sound like a realistic scenario), then this doesn't apply at all.
    But if you have an equi-join which cannot be performed because you need to 
match a huge number of rows, and you can narrow down the search window using 
range conditions, then the advantage is that this makes it feasible and/or 
much, much faster.
    
    Regarding the second point, I don't want to tell you what to do, but on my 
part I can say that this has been tested with unit tests and on real, large 
datasets and I believe it should be safe to merge. But it can also wait for 
2.5/3.0...




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

Reply via email to