Hi there, imagine you have a table of time-series transactions, e.g.: CREATE TABLE foobar (actor BIGINT, ts BIGINT, action STRING);
containing the rows: 1 2000 bar 1 1000 foo 2 1500 foo An interesting query would be to get a denormalized view on all actions of an actor sorted by timestamp: 1 [foo, bar] 2 [foo] This would require collecting the actions in a list in non-natural order. A naive query would look like this: SET hive.map.aggr = false; SELECT actor, COLLECT_LIST(action) AS actions FROM foobar GROUP BY actor SORT BY actor, ts; Unfortunately, as the column "ts" is not in the final result set, this produces a SemanticException [Error 10004]: Line 9:11 Invalid table alias or column reference 'ts'. I do understand that it is difficult to allow a global ORDER BY on column that is not in the result set. But the SORT BY only needs to ensure order within the reduce group. Is there a way to get this behaviour in Hive? Best regards Michael