[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

viirya Sat, 22 Oct 2016 17:20:39 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/15575
  
    @tejasapatil I see there is 1:1 mapping among output partition of child 
operator and output partition of `ExpandExec`. 
    
    For example we have an Expand applying on a data set like col: [1, 2, 3]. 
If the projections are col, col + 1, col + 2.
    
    Assume the partition of the data set is HashPartition(col). We have three 
partitions:
    p1: [1]
    p2: [2]
    p3: [3]
    
    After the Expand, the data set becomes:
    
    p1: [1, 2, 3]
    p2: [2, 3, 4]
    p3: [3, 4, 5]
    
    Is it still valid for HashPartition(col)? Looks like it doesn't. I think It 
is why there is a comment on ExpandExec in the code position you links to.
    
    BTW, in your table `ExpandExec`'s `outputPartitioning` is 
`UnknownPartitioning`, right? If it doesn't change child's partition, why we 
don't set it to child's outputPartitioning?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

Reply via email to