One table per day just doesn't make sense. If your most frequent use case is retrieving data for a single day then you might want to design your row keys to include date or simply rowKey = "year-day of year". The row keys are sorted lexically by hbase. This way you can easily retrieve all records for a day by using scan alongwith rowKeyFilters. To perform analysis on data from multiple days you can then use map reduce jobs.
- Rohit Kelkar On Mon, Feb 27, 2012 at 1:15 PM, Something Something <[email protected]> wrote: >>>why you even need hbase to store logs > So that all the useful information in the logs can be sliced & diced anyway > we want quickly without the need for sequential search. Isn't indexed > search faster than sequential? Isn't that why HBase (and BigTable for that > matter) was created in the first place? > >>>you will not only have to deal with many tables > This was pointed out in the 'Cons' section. We understand that but > deleting data older than 60 days is very easy. Just need to delete those > tables. > >>>... when data changes is going to be unnecessarily complex. > Once created, data will NOT change. The data is from logs from previous > days. It's historical data. > >>>if you have different tables for different days, it will get cumbersome > to search.. > When user needs data across multiple dates, we can either get data > sequentially for each day for small queries OR for long running queries get > data by running queries in parallel for each day & then combining results > for all days. Keeping HBase Regions separate for each day does provide > some performance benefits - we think. This is where we need help from the > community. > >>>so if you can give more details on what you want to do with the stored > data > Hmm. The 2nd question is more about understanding the pros & cons of using > 'String' Vs 'Custom Class' for Row Keys. > > Thanks. > > > On Sun, Feb 26, 2012 at 10:48 PM, T Vinod Gupta <[email protected]>wrote: > >> before even getting into schema design, im curious to know why you even >> need hbase to store logs? >> >> coming to the options below, option 1 sounds very naive and >> unsophisticated.. you will not only have to deal with many tables but the >> processing around the times when date changes is going to be unnecessarily >> complex. besides, most common use of logs is to search for stuff. if you >> have different tables for different days, it will get cumbersome to >> search.. >> >> regarding the right schema, it all depends on your use case. so if you can >> give more details on what you want to do with the stored data, that helps. >> the row key, column family and column name structure depends on what is >> your access pattern (both reads and writes) and sorting requirements. >> >> thanks >> >> On Sun, Feb 26, 2012 at 10:24 PM, Something Something < >> [email protected]> wrote: >> >> > Trying to design a HBase schema for a log processing application. We >> will >> > get new logs every day. >> > >> > 1) We are thinking we will keep data for each day in separate tables. >> The >> > table names would be something like XYZ-2012-02-26 etc. There will be >> at >> > most 4 tables for each day. >> > >> > Pros: >> > Other processes that are processing old data are not affected while data >> is >> > getting ready for each day. >> > It's easier to delete old data that's no longer needed. Just delete the >> > tables. >> > >> > Cons: >> > Lots of tables to deal with. >> > Any other?? >> > >> > (Other option is, of course, to create a Table with dates and other >> tables >> > will have keys that contain date - at the end of the row key). >> > >> > >> > 2) We are thinking the RowKeys will be in String format with a separator >> > character e.g. ordernum*itemnum. The keys will only contain IDs & these >> > IDs will be small, probably 6 digits each. >> > >> > Pros: >> > It's easier to look/search for data using HBase Shell. >> > Very easy to implement. >> > >> > Cons: >> > As pointed out here (http://hbase.apache.org/book/rowkey.design.html), >> > Strings need nearly 3x the bytes. >> > >> > (Other option is to create a separate Classes for compound row keys. Is >> it >> > worth the effort?) >> > >> > >> > Is there a general consensus regarding these issues? Thanks in advance >> for >> > your help. >> > >>
