Hi all,
Right now I am aggregating our log data and populating tables based on how
we want to query the data later. Currently I have eleven different aggregation
tables and the date is part of the Row key. Since we usually slice our data by
day I was wondering if it would be better to create aggregation table by date.
I would no longer have to use the date as part of the stop/end row keys in a
scan and it would be easier to prune old data. I would also guess there would
be less contention on tables between the process that populates the table and
the processes that query the table. One of the only problems I see, with my
limited knowledge about HBase, is the tables will end up being rather small and
would most likely end up on one region server.
Long story short, is this a good idea?
Thanks
-Pete