[ https://issues.apache.org/jira/browse/TRAFODION-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hans Zeller resolved TRAFODION-2392. ------------------------------------ Resolution: Fixed Fix Version/s: 2.1-incubating Fix checked in on 12/23/2016 with https://github.com/apache/incubator-trafodion/pull/882 > Avoid a costly sort for highly reducing TMUDFs > ---------------------------------------------- > > Key: TRAFODION-2392 > URL: https://issues.apache.org/jira/browse/TRAFODION-2392 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp > Affects Versions: 2.0-incubating > Environment: Any > Reporter: Hans Zeller > Assignee: Hans Zeller > Fix For: 2.1-incubating > > > When an input table with a PARTITION BY is specified in a TMUDF, the > Trafodion optimizer ensures that the input rows are sorted on (a permutation > of) the PARTITION BY columns, so that each parallel TMUDF instance sees the > input rows of such a logical partition in contiguous rows. This way the TMUDF > can process each group separately. > This is usually a good way to process the data, except when we are dealing > with a large input table and a TMUDF that highly reduces the input data. In > that case it may be better to maintain a hash table of groups in the TMUDF > and to avoid the costly sort of the input table. > My proposal is to add a new function type to UDRInvocationInfo.FunctionType, > called REDUCER_NC (for Non-Contiguous). Setting the function type to this new > type would indicate to the optimizer not to request a sort order on the > partitioning columns. > The table below shows how the function type and PARTITION BY and ORDER BY > clauses would determine the effective sort order produced by the optimizer: > ||Function type||PARTITION BY||ORDER BY||Data is sorted by|| > |REDUCER (existing)|a,b|c,d|a,b,c,d| > |REDUCER (existing)|a,b|<empty>|a,b| > |REDUCER_NC (proposed)|a,b|c,d|c,d| > |REDUCER_NC (proposed)|a,b|<empty>|<no sort>| > In all other aspects, REDUCER and REDUCER_NC function types would behave the > same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)