if you want data to get deleted after a while, you can put a ttl on the table. based on what you described, you can append (suffix preferably) date to your row names so that you can do scans/gets appropriately for your use case.
thanks On Sun, Feb 26, 2012 at 11:45 PM, Something Something < [email protected]> wrote: > >>why you even need hbase to store logs > So that all the useful information in the logs can be sliced & diced anyway > we want quickly without the need for sequential search. Isn't indexed > search faster than sequential? Isn't that why HBase (and BigTable for that > matter) was created in the first place? > > >>you will not only have to deal with many tables > This was pointed out in the 'Cons' section. We understand that but > deleting data older than 60 days is very easy. Just need to delete those > tables. > > >>... when data changes is going to be unnecessarily complex. > Once created, data will NOT change. The data is from logs from previous > days. It's historical data. > > >>if you have different tables for different days, it will get cumbersome > to search.. > When user needs data across multiple dates, we can either get data > sequentially for each day for small queries OR for long running queries get > data by running queries in parallel for each day & then combining results > for all days. Keeping HBase Regions separate for each day does provide > some performance benefits - we think. This is where we need help from the > community. > > >>so if you can give more details on what you want to do with the stored > data > Hmm. The 2nd question is more about understanding the pros & cons of using > 'String' Vs 'Custom Class' for Row Keys. > > Thanks. > > > On Sun, Feb 26, 2012 at 10:48 PM, T Vinod Gupta <[email protected] > >wrote: > > > before even getting into schema design, im curious to know why you even > > need hbase to store logs? > > > > coming to the options below, option 1 sounds very naive and > > unsophisticated.. you will not only have to deal with many tables but the > > processing around the times when date changes is going to be > unnecessarily > > complex. besides, most common use of logs is to search for stuff. if you > > have different tables for different days, it will get cumbersome to > > search.. > > > > regarding the right schema, it all depends on your use case. so if you > can > > give more details on what you want to do with the stored data, that > helps. > > the row key, column family and column name structure depends on what is > > your access pattern (both reads and writes) and sorting requirements. > > > > thanks > > > > On Sun, Feb 26, 2012 at 10:24 PM, Something Something < > > [email protected]> wrote: > > > > > Trying to design a HBase schema for a log processing application. We > > will > > > get new logs every day. > > > > > > 1) We are thinking we will keep data for each day in separate tables. > > The > > > table names would be something like XYZ-2012-02-26 etc. There will be > > at > > > most 4 tables for each day. > > > > > > Pros: > > > Other processes that are processing old data are not affected while > data > > is > > > getting ready for each day. > > > It's easier to delete old data that's no longer needed. Just delete > the > > > tables. > > > > > > Cons: > > > Lots of tables to deal with. > > > Any other?? > > > > > > (Other option is, of course, to create a Table with dates and other > > tables > > > will have keys that contain date - at the end of the row key). > > > > > > > > > 2) We are thinking the RowKeys will be in String format with a > separator > > > character e.g. ordernum*itemnum. The keys will only contain IDs & > these > > > IDs will be small, probably 6 digits each. > > > > > > Pros: > > > It's easier to look/search for data using HBase Shell. > > > Very easy to implement. > > > > > > Cons: > > > As pointed out here (http://hbase.apache.org/book/rowkey.design.html), > > > Strings need nearly 3x the bytes. > > > > > > (Other option is to create a separate Classes for compound row keys. Is > > it > > > worth the effort?) > > > > > > > > > Is there a general consensus regarding these issues? Thanks in advance > > for > > > your help. > > > > > >
