Hi there,

imagine you have a table of time-series transactions, e.g.:
CREATE TABLE
    foobar (actor BIGINT, ts BIGINT, action STRING);

containing the rows:
1       2000    bar
1       1000    foo
2       1500    foo

An interesting query would be to get a denormalized view on all actions of an 
actor sorted by timestamp:
1       [foo, bar]
2       [foo]

This would require collecting the actions in a list in non-natural order. A 
naive query would look like this:
SET hive.map.aggr = false;
SELECT
    actor,
    COLLECT_LIST(action) AS actions
FROM
    foobar
GROUP BY
    actor
SORT BY
    actor, ts;

Unfortunately, as the column "ts" is not in the final result set, this produces 
a SemanticException [Error 10004]: Line 9:11 Invalid table alias or column 
reference 'ts'.


I do understand that it is difficult to allow a global ORDER BY on column that 
is not in the result set.
But the SORT BY only needs to ensure order within the reduce group. Is there a 
way to get this behaviour in Hive?

Best regards
Michael

Reply via email to