[ 
https://issues.apache.org/jira/browse/DRILL-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5912:
------------------------------------

    Assignee: Boaz Ben-Zvi

> Hash Join Enhancement: Avoid copying probe side values
> ------------------------------------------------------
>
>                 Key: DRILL-5912
>                 URL: https://issues.apache.org/jira/browse/DRILL-5912
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Minor
>
> When the Hash Join Operator (inner, or left outer) performs the "probe and 
> project" task, it copies each probe side values to be projected. Example:
> {code}
>     public void projectProbeRecord(int probeIndex, int outIndex)
>         throws SchemaChangeException
>     {
>         {
>             vv15 .copyFromSafe((probeIndex), (outIndex), vv12);
>         }
>         {
>             vv21 .copyFromSafe((probeIndex), (outIndex), vv18);
>         }
>     }
> {code}
> In the case where there are no duplicate-key entries in the build side, and 
> no spilling took place, then each of the outer values is projected exactly 
> once (for left outer), or at most once (for inner join). 
> In such (common) cases, we could avoid the above copy, and just transfer the 
> value vectors as is (or add a Selection Vector 2 for the inner join, to 
> eliminate the unmatched entries).
> This can be a significant performance enhancement, as copying each set of 
> values is much more expensive than transposing vectors (e.g., perform the 
> copy 64K times, plus allocation of the vectors, and possible resizing for 
> variable sized types).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to