Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    There's no design doc. I didn't feel the change was big enough to warrant 
one.
    
    1. Currently there is no spill-over to disk. If the range is too big, users 
can switch this off and use the much slower SMJ version, without an OOM. 
Implementing spill-over doesn't look trivial because it's more dynamic than the 
original version. It's not clear how to implement that. Maybe we can add that 
in the future, once we figure it out?
    2. This whole optimization doesn't apply when there is no equal condition.
    3. I didn't understand this case you're describing. Can you elaborate, 
please? Either way, only one pass through the data is needed, skewed or not 
skewed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to