Boaz Ben-Zvi created DRILL-6400: ----------------------------------- Summary: Hash-Aggr: Avoid recreating common Hash-Table setups for every partition Key: DRILL-6400 URL: https://issues.apache.org/jira/browse/DRILL-6400 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.13.0 Reporter: Boaz Ben-Zvi Assignee: Boaz Ben-Zvi Fix For: 1.14.0
The current Hash-Aggr code (and soon the Hash-Join code) creates multiple partitions to hold the incoming data; each partition with its own HashTable. The current code invokes the HashTable method _createAndSetupHashTable()_ for *each* partition. But most of the setups done by this method are identical for all the partitions (e.g., code generation). Calling this method has a performance cost (some local tests measured between 3 - 30 milliseconds, depends on the key columns). Suggested performance improvement: Extract the common settings to be called *once*, and use the results later by all the partitions. When running with the default 32 partitions, this can have a measurable improvement (and if spilling, this method is used again....). -- This message was sent by Atlassian JIRA (v7.6.3#76005)