I am assuming that your read pattern is base on user sessions, i.e, your
user logs in and then chances are that you will have to look at various
things for this user such as his logs, his searches etc.
I was investigating a similar problem, and from the info I collected
this is the architecture I came up with:
-a single table,
-a single column family
-store all of the different types of data for this user based on
multiple keys which are "close" for this user (*).
Only this way you are sure that all data is co-located, i.e. likely to
fit into the same / adjacent regions.
With this design and the right tuning, all of the data belonging to one
user, is likely to be sitting on only one region server (as opposed to
be distributed over many region servers.
Only one region server for all kinds of session data has a lot of
advantages: less overhead, less connections, if one region server is
down fewer total number of users are affected. etc.
(*) of course, while also making sure that the whole set of keys will
have a reasonable distribution.
On 02/01/2012 08:59 AM, Mark wrote:
We would like to track all of our users interactions ordered by time.
Product views, searches, logins, etc. There are (at least) two ways of
accomplishing this:
We could use one table 'user_logs' and have keys in the format of.
USER_ID/TYPE/TIMESTAMP. Type could be (product view, search, login, etc)
Or we could have multiple tables for each type.. UserProductLogs,
UserSearchLogs, etc.
What are the pros/cons of each strategy and which one do you think I
should employ?
- M