Benjamin Kim created HIVE-6230: ---------------------------------- Summary: Hive UDAF with subquery runs all logic on reducers Key: HIVE-6230 URL: https://issues.apache.org/jira/browse/HIVE-6230 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.10.0 Reporter: Benjamin Kim
When I have a subquery in my custom built UDAF, all the iterate, terminatePartial, merge, terminate runs on reducers only, where iterate and terminatePartial should run on mappers. Now I don't know if this is due to design purpose, but this behavior leads to very long execution time on reducers and create large temporary files from them. This happened to me with SimpleUDAF. I haven't tested it with GenericUDAF. Here is an example SELECT MyUDAF(col1) FROM( SELECT * FROM test) GROUP BY col2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)