Hi, 
I have a use case that need to get daily, weekly or monthly active users count 
according to the native hourly data, say as a large datasets.
The native datasets are instantly updated and I want to get the distinct active 
user count per time dimension. Anyone can show some 
efficient way of reaching this ? 
If I want to get daily active distinct user count , I would get this day's each 
hour dataset and do some calculation ? My initial thought on this
is to use a key value store and use a hashset to store the hourly userid. Then 
I can compare and distinct each hourly userid set and got the 
daily distinct count. However , I am not sure about this implementation can be 
some efficient workaround. 
Hope some guys can shed a little light on this.

Best,
Sun.



fightf...@163.com

Reply via email to