Hi there,

I have my data stored in HDFS partitioned by month in Parquet format.
The directory looks like this:

-month=201411
-month=201412
-month=201501
-....

I want to compute some aggregates for every timestamp.
How is it possible to achieve that by taking advantage of the existing
partitioning?
One naive way I am thinking is issuing multiple sql queries:

SELECT * FROM TABLE WHERE month=201411
SELECT * FROM TABLE WHERE month=201412
SELECT * FROM TABLE WHERE month=201501
.....

computing the aggregates on the results of each query and combining them in
the end.

I think there should be a better way right?

Thanks

Reply via email to