Hi, I have a use case that need to get daily, weekly or monthly active users count according to the native hourly data, say as a large datasets. The native datasets are instantly updated and I want to get the distinct active user count per time dimension. Anyone can show some efficient way of reaching this ? If I want to get daily active distinct user count , I would get this day's each hour dataset and do some calculation ? My initial thought on this is to use a key value store and use a hashset to store the hourly userid. Then I can compare and distinct each hourly userid set and got the daily distinct count. However , I am not sure about this implementation can be some efficient workaround. Hope some guys can shed a little light on this.
Best, Sun. fightf...@163.com