[GitHub] spark pull request #21587: [SPARK-24588][SS] streaming join should require H...

cloud-fan Mon, 18 Jun 2018 17:48:39 -0700

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21587


    [SPARK-24588][SS] streaming join should require HashClusteredPartitioning 
from children

    ## What changes were proposed in this pull request?
    
    In https://github.com/apache/spark/pull/19080 we simplified the 
distribution/partitioning framework, and make all the join-like operators 
require `HashClusteredPartitioning` from children. Unfortunately streaming join 
operator was missed.
    
    It's not a real issue. There are 2 partitionings that can satisfy 
`ClusteredDistribution`: hash partitioning and range partitioning. In streaming 
we don't support sort, so streaming join will not mix hash partitioning and 
range partitioning, and produce wrong result. And the streaming source API 
doesn't support reporting range partitioning yet.
    
    But we should fix this potential bug.
    
    ## How was this patch tested?
    
    N/A


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21587
    
----
commit 1f3d9df26bc543802b02b9f5b20178f6255752dd
Author: Wenchen Fan <wenchen@...>
Date:   2018-06-18T23:55:47Z

    StreamingSymmetricHashJoinExec should require HashClusteredPartitioning 
from children

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21587: [SPARK-24588][SS] streaming join should require H...

Reply via email to