Navis created HIVE-6144:
---------------------------

             Summary: Implement non-staged MapJoin
                 Key: HIVE-6144
                 URL: https://issues.apache.org/jira/browse/HIVE-6144
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Navis
            Assignee: Navis
            Priority: Minor


For map join, all data in small aliases are hashed and stored into temporary 
file in MapRedLocalTask. But for some aliases without filter or projection, it 
seemed not necessary to do that. For example.

{noformat}
select a.* from src a join src b on a.key=b.key;
{noformat}

makes plan like this.
{noformat}
STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        a 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        a 
          TableScan
            alias: a
            HashTable Sink Operator
              condition expressions:
                0 {key} {value}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              Position of Big Table: 1

  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        b 
          TableScan
            alias: b
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {key} {value}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              outputColumnNames: _col0, _col1
              Position of Big Table: 1
              Select Operator
                File Output Operator
      Local Work:
        Map Reduce Local Work
  Stage: Stage-0
    Fetch Operator
{noformat}

table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan 
can be like below.
{noformat}
  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        b 
          TableScan
            alias: b
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {key} {value}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              outputColumnNames: _col0, _col1
              Position of Big Table: 1
              Select Operator
                  File Output Operator
      Local Work:
        Map Reduce Local Work
          Alias -> Map Local Tables:
            a 
              Fetch Operator
                limit: -1
          Alias -> Map Local Operator Tree:
            a 
              TableScan
                alias: a
          Has Any Stage Alias: false
  Stage: Stage-0
    Fetch Operator
{noformat}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to