if you want data to get deleted after a while, you can put a ttl on the
table.
based on what you described, you can append (suffix preferably) date to
your row names so that you can do scans/gets appropriately for your use
case.

thanks

On Sun, Feb 26, 2012 at 11:45 PM, Something Something <
[email protected]> wrote:

> >>why you even need hbase to store logs
> So that all the useful information in the logs can be sliced & diced anyway
> we want quickly without the need for sequential search.  Isn't indexed
> search faster than sequential?  Isn't that why HBase (and BigTable for that
> matter) was created in the first place?
>
> >>you will not only have to deal with many tables
> This was pointed out in the 'Cons' section.  We understand that but
> deleting data older than 60 days is very easy.  Just need to delete those
> tables.
>
> >>... when data changes is going to be unnecessarily complex.
> Once created, data will NOT change.  The data is from logs from previous
> days.  It's historical data.
>
> >>if you have different tables for different days, it will get cumbersome
> to search..
> When user needs data across multiple dates, we can either get data
> sequentially for each day for small queries OR for long running queries get
> data by running queries in parallel for each day & then combining results
> for all days.  Keeping HBase Regions separate for each day does provide
> some performance benefits - we think.  This is where we need help from the
> community.
>
> >>so if you can give more details on what you want to do with the stored
> data
> Hmm.  The 2nd question is more about understanding the pros & cons of using
> 'String' Vs 'Custom Class' for Row Keys.
>
> Thanks.
>
>
> On Sun, Feb 26, 2012 at 10:48 PM, T Vinod Gupta <[email protected]
> >wrote:
>
> > before even getting into schema design, im curious to know why you even
> > need hbase to store logs?
> >
> > coming to the options below, option 1 sounds very naive and
> > unsophisticated.. you will not only have to deal with many tables but the
> > processing around the times when date changes is going to be
> unnecessarily
> > complex. besides, most common use of logs is to search for stuff. if you
> > have different tables for different days, it will get cumbersome to
> > search..
> >
> > regarding the right schema, it all depends on your use case. so if you
> can
> > give more details on what you want to do with the stored data, that
> helps.
> > the row key, column family and column name structure depends on what is
> > your access pattern (both reads and writes) and sorting requirements.
> >
> > thanks
> >
> > On Sun, Feb 26, 2012 at 10:24 PM, Something Something <
> > [email protected]> wrote:
> >
> > > Trying to design a HBase schema for a log processing application.  We
> > will
> > > get new logs every day.
> > >
> > > 1)  We are thinking we will keep data for each day in separate tables.
> >  The
> > > table names would be something like  XYZ-2012-02-26 etc.  There will be
> > at
> > > most 4 tables for each day.
> > >
> > > Pros:
> > > Other processes that are processing old data are not affected while
> data
> > is
> > > getting ready for each day.
> > > It's easier to delete old data that's no longer needed.  Just delete
> the
> > > tables.
> > >
> > > Cons:
> > > Lots of tables to deal with.
> > > Any other??
> > >
> > > (Other option is, of course, to create a Table with dates and other
> > tables
> > > will have keys that contain date - at the end of the row key).
> > >
> > >
> > > 2)  We are thinking the RowKeys will be in String format with a
> separator
> > > character e.g.  ordernum*itemnum.  The keys will only contain IDs &
> these
> > > IDs will be small, probably 6 digits each.
> > >
> > > Pros:
> > > It's easier to look/search for data using HBase Shell.
> > > Very easy to implement.
> > >
> > > Cons:
> > > As pointed out here (http://hbase.apache.org/book/rowkey.design.html),
> > > Strings need nearly 3x the bytes.
> > >
> > > (Other option is to create a separate Classes for compound row keys. Is
> > it
> > > worth the effort?)
> > >
> > >
> > > Is there a general consensus regarding these issues?  Thanks in advance
> > for
> > > your help.
> > >
> >
>

Reply via email to