[ https://issues.apache.org/jira/browse/DRILL-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni reassigned DRILL-5586: --------------------------------- Assignee: Jinfeng Ni > UnionAll operator does more than necessary value vector allocation and copy > --------------------------------------------------------------------------- > > Key: DRILL-5586 > URL: https://issues.apache.org/jira/browse/DRILL-5586 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > > When inputs to UnionAll operators are just simple field reference, in stead > of an expression involving a function, which requires evaluation, it should > leverage value vector's transfer API. Doing transfer would avoid the > allocation of buffer for value vector in outgoing batch, plus the overhead to > copy the data from incoming batch to outgoing batch. > For example, in the following query: > {code} > select l_orderkey from cp.`tpch/lineitem.parquet` l union all select > n_nationkey from cp.`tpch/nation.parquet` > {code} > Both left and right side of UnionAll operator is simple filed reference, and > Drill should call transfer API. However, the current code would do buffer > allocation & copy for both left and right. Such processing would > significantly slow UnionAll operator's performance, and eventually slow down > query evaluation. > DRILL-5521 reverts a change in logic whether applying transfer logic made in > DRILL-5419, based on SchemaPath equal comparison. Even we fix that problem, > it's not enough to use SchemaPath equal comparison as criteria whether > transfer should be used. Ideally, even the output field and incoming field > have different names, UnionAll operator should do {{transfer}}, instead of > {{copy}}, as long as the expression is simple field reference. > {code} > select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select > n_nationkey as Key2 from cp.`tpch/nation.parquet` > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)