Re: Couple of schema design questions

Rohit Kelkar Mon, 27 Feb 2012 00:12:04 -0800

One table per day just doesn't make sense. If your most frequent use
case is retrieving data for a single day then you might want to design
your row keys to include date or simply rowKey = "year-day of year".
The row keys are sorted lexically by hbase.
This way you can easily retrieve all records for a day by using scan
alongwith rowKeyFilters.
To perform analysis on data from multiple days you can then use map reduce jobs.


- Rohit Kelkar

On Mon, Feb 27, 2012 at 1:15 PM, Something Something
<[email protected]> wrote:
>>>why you even need hbase to store logs
> So that all the useful information in the logs can be sliced & diced anyway
> we want quickly without the need for sequential search.  Isn't indexed
> search faster than sequential?  Isn't that why HBase (and BigTable for that
> matter) was created in the first place?
>
>>>you will not only have to deal with many tables
> This was pointed out in the 'Cons' section.  We understand that but
> deleting data older than 60 days is very easy.  Just need to delete those
> tables.
>
>>>... when data changes is going to be unnecessarily complex.
> Once created, data will NOT change.  The data is from logs from previous
> days.  It's historical data.
>
>>>if you have different tables for different days, it will get cumbersome
> to search..
> When user needs data across multiple dates, we can either get data
> sequentially for each day for small queries OR for long running queries get
> data by running queries in parallel for each day & then combining results
> for all days.  Keeping HBase Regions separate for each day does provide
> some performance benefits - we think.  This is where we need help from the
> community.
>
>>>so if you can give more details on what you want to do with the stored
> data
> Hmm.  The 2nd question is more about understanding the pros & cons of using
> 'String' Vs 'Custom Class' for Row Keys.
>
> Thanks.
>
>
> On Sun, Feb 26, 2012 at 10:48 PM, T Vinod Gupta <[email protected]>wrote:
>
>> before even getting into schema design, im curious to know why you even
>> need hbase to store logs?
>>
>> coming to the options below, option 1 sounds very naive and
>> unsophisticated.. you will not only have to deal with many tables but the
>> processing around the times when date changes is going to be unnecessarily
>> complex. besides, most common use of logs is to search for stuff. if you
>> have different tables for different days, it will get cumbersome to
>> search..
>>
>> regarding the right schema, it all depends on your use case. so if you can
>> give more details on what you want to do with the stored data, that helps.
>> the row key, column family and column name structure depends on what is
>> your access pattern (both reads and writes) and sorting requirements.
>>
>> thanks
>>
>> On Sun, Feb 26, 2012 at 10:24 PM, Something Something <
>> [email protected]> wrote:
>>
>> > Trying to design a HBase schema for a log processing application.  We
>> will
>> > get new logs every day.
>> >
>> > 1)  We are thinking we will keep data for each day in separate tables.
>>  The
>> > table names would be something like  XYZ-2012-02-26 etc.  There will be
>> at
>> > most 4 tables for each day.
>> >
>> > Pros:
>> > Other processes that are processing old data are not affected while data
>> is
>> > getting ready for each day.
>> > It's easier to delete old data that's no longer needed.  Just delete the
>> > tables.
>> >
>> > Cons:
>> > Lots of tables to deal with.
>> > Any other??
>> >
>> > (Other option is, of course, to create a Table with dates and other
>> tables
>> > will have keys that contain date - at the end of the row key).
>> >
>> >
>> > 2)  We are thinking the RowKeys will be in String format with a separator
>> > character e.g.  ordernum*itemnum.  The keys will only contain IDs & these
>> > IDs will be small, probably 6 digits each.
>> >
>> > Pros:
>> > It's easier to look/search for data using HBase Shell.
>> > Very easy to implement.
>> >
>> > Cons:
>> > As pointed out here (http://hbase.apache.org/book/rowkey.design.html),
>> > Strings need nearly 3x the bytes.
>> >
>> > (Other option is to create a separate Classes for compound row keys. Is
>> it
>> > worth the effort?)
>> >
>> >
>> > Is there a general consensus regarding these issues?  Thanks in advance
>> for
>> > your help.
>> >
>>

Re: Couple of schema design questions

Reply via email to