Trying to design a HBase schema for a log processing application. We will get new logs every day.
1) We are thinking we will keep data for each day in separate tables. The table names would be something like XYZ-2012-02-26 etc. There will be at most 4 tables for each day. Pros: Other processes that are processing old data are not affected while data is getting ready for each day. It's easier to delete old data that's no longer needed. Just delete the tables. Cons: Lots of tables to deal with. Any other?? (Other option is, of course, to create a Table with dates and other tables will have keys that contain date - at the end of the row key). 2) We are thinking the RowKeys will be in String format with a separator character e.g. ordernum*itemnum. The keys will only contain IDs & these IDs will be small, probably 6 digits each. Pros: It's easier to look/search for data using HBase Shell. Very easy to implement. Cons: As pointed out here (http://hbase.apache.org/book/rowkey.design.html), Strings need nearly 3x the bytes. (Other option is to create a separate Classes for compound row keys. Is it worth the effort?) Is there a general consensus regarding these issues? Thanks in advance for your help.
