before even getting into schema design, im curious to know why you even
need hbase to store logs?

coming to the options below, option 1 sounds very naive and
unsophisticated.. you will not only have to deal with many tables but the
processing around the times when date changes is going to be unnecessarily
complex. besides, most common use of logs is to search for stuff. if you
have different tables for different days, it will get cumbersome to search..

regarding the right schema, it all depends on your use case. so if you can
give more details on what you want to do with the stored data, that helps.
the row key, column family and column name structure depends on what is
your access pattern (both reads and writes) and sorting requirements.

thanks

On Sun, Feb 26, 2012 at 10:24 PM, Something Something <
[email protected]> wrote:

> Trying to design a HBase schema for a log processing application.  We will
> get new logs every day.
>
> 1)  We are thinking we will keep data for each day in separate tables.  The
> table names would be something like  XYZ-2012-02-26 etc.  There will be at
> most 4 tables for each day.
>
> Pros:
> Other processes that are processing old data are not affected while data is
> getting ready for each day.
> It's easier to delete old data that's no longer needed.  Just delete the
> tables.
>
> Cons:
> Lots of tables to deal with.
> Any other??
>
> (Other option is, of course, to create a Table with dates and other tables
> will have keys that contain date - at the end of the row key).
>
>
> 2)  We are thinking the RowKeys will be in String format with a separator
> character e.g.  ordernum*itemnum.  The keys will only contain IDs & these
> IDs will be small, probably 6 digits each.
>
> Pros:
> It's easier to look/search for data using HBase Shell.
> Very easy to implement.
>
> Cons:
> As pointed out here (http://hbase.apache.org/book/rowkey.design.html),
> Strings need nearly 3x the bytes.
>
> (Other option is to create a separate Classes for compound row keys. Is it
> worth the effort?)
>
>
> Is there a general consensus regarding these issues?  Thanks in advance for
> your help.
>

Reply via email to