Hi,
I am trying to figure out if Hbase is the right candidate for my use case
which is as follows :
I have a users table containing millions users and for each user I have a
bunch of data points for each day in past
2 years. Some of these data points are number of clicks in different parts
of a web page, total # of clicks, total
searches, # of unique searches etc. So the data is in this form :
User Id
Date
X1 (Total Clicks)
X2 (Total Searches)
X3
…..
Xn
1
D1-730
4
0.8
90
1
D1-729
2
0.5
50
…
1
D1
30
0.9
20
2
D1-730
23
1.2
85
2
D1-729
56
2.3
56
….
My application has the following predominant query pattern - For a subset
of users (subset being quite large in order of 1 -5 mil), I want to do sum,
min, max, mean, standard deviation of data points for different date ranges
for the users. So for eg user1 may have a start and end date of {sd1, ed1},
user2 may have {sd2, ed2} and so on. I want to compute sum, min, max etc
for data points X1, X2, … Xn over date ranges {sd1, ed1}, {sd2, ed2} ,
{sd3, ed3} for each user in the subset .
Currently we do this in db by creating a table for subset of the users with
their start and end day and joining against the users tables. The query
however is extremely slow and takes hours to execute.
I am trying to figure out the following :
1. Can I do the above query efficiently (I want to reduce the query
time. Space is not that big of a concern for me) using Hbase ?
1. Can someone please give me alternative solutions if Hbase is not the
right solution for such a use case ?
Thanks,
dlg