If it is possible to alter the table structure a little, the easiest way is to encode date and hour in one column and use that column as kylin partition column. E.g. have a new column called "date_and_hour". It takes values like 2018051310 to represent 10 am of 2018-05-13. And your query may go
select count(*) from fact where date_and_hour between 2018051200 and 2018051323 Cheers Yang On Fri, May 18, 2018 at 10:55 AM, <[email protected]> wrote: > My current approach to time hierarchy and partition is as follow: > > - I have partition date and hour column in Hive table to avoid > full Hive table scan. Column names are partition_date, partition_hour > > - I have separate fields in fact table name Year, Month, Day, > Hour and use these column as hierarchy dimensions in Kylin cube build. I > use *dictionary encoding*. > > - When I want to query time range, I have to list all > combination of time hierarchy dimensions, for example (Month, Day), in > order to query. > > My query seems to be slower when the cube get bigger with the same time > range. So I want to ask the best practice to design time hierarchy and > query time range in Kylin. I see some support for timestamp in Streaming > cube but I don’t see guideline for design time dimension for normal cube > except partition date and hour in Hive. > > I also suspect that my time range query get slower because it currently > need to scan all segment. > > I think we need > > >
