Hi, I have one data file which I converted to parquet with CTAS.
It took about 35 seconds to execute next query: select action['login'], count(*) from dfs.datastore.events group by action['login']; After splitting original source to 4 equal parts I created 4 view on this parts (events_0, events_1, events_2, events_3): create view dfs.datastore.events_combined as select t0.`timestamp` as event_time, t0.client_id, t0.action from dfs.datastore.events_0 t0 union all select t1.`timestamp` as event_time, t1.client_id, t1.action from dfs.datastore.events_1 t1 union all select t2.`timestamp` as event_time, t2.client_id, t2.action from dfs.datastore.events_2 t2 union all select t3.`timestamp` as event_time, t3.client_id, t3.action from dfs.datastore.events_3 t3; When I make same query but on this view it executes much slower - about 500 seconds. select action['login'], count(*) from dfs.datastore.events_combined group by action['login']; I expected to see same execution time, but it degraded too much. What could cause it and/or could it be solved somehow?
