Re: more tables or more rows

J Mohamed Zahoor Thu, 09 Aug 2012 01:22:07 -0700

Hi MK,

Some suggestions here


http://www.hbasecon.com/sessions/lightning-talk-real-performance-gains-with-real-time-data/

./zahoor

On Wed, Aug 8, 2012 at 5:44 PM, M. Karthikeyan
<[email protected]>wrote:

> Hi,
> A slightly related question:
> We have time series data continuously flowing into the system and has to
> be stored in HBase.
> We have retention policy to retain data for 90 days, so data older than 90
> days have to be deleted from HBase every midnight.
> There are two (that we know) ways of doing this:
> 1) Since bulk deletes could be costly and dropping an entire table is
> easier, we could have day wise tables and drop entire table
> 2) This post
> http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603suggests 
> that we can have a single table and use the TTL feature for ageing
> out data.
>
> May I request someone to briefly list out the pros and cons of either
> options?
> PS: We expect around 200 million records per day and each record would be
> approx.. 500 bytes.
> Thanks & Regards
> MK
>
>
> -----Original Message-----
> From: Mohammad Tariq [mailto:[email protected]]
> Sent: 08 August 2012 03:19
> To: [email protected]
> Subject: Re: more tables or more rows
>
> Hello sir,
>
>     It is absolutely fine to have as many tables as we like. My point was
> that if we have a large no of tables then it might add some overhead in
> locating the user region, as there will be a huge amount of mapping from
> "user tables" to "region servers". Also, client will have to cache  more
> information blocking the additional memory. So, I suggested to have small
> no of large tables rather than large no of small tables, if the data is
> similar.
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <[email protected]> wrote:
> > Thanks Mohammad,
> >
> > By saying the major purpose is to host very large tables (implying a
> > smaller number of them), are you referring to anything other than the
> > memstores per column family taking up sizable portions of physical
> memory?
> >  Are there other components or design aspects that make using large
> > numbers of tables inadvisable?
> >
> > On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <[email protected]>
> wrote:
> >> Hello sir,
> >>
> >>       Going for a single table with 30+ rows would be a better
> >> choice, if the data from all the sources is not very different.
> >> Since, you are considering Hbase as your data store, it wouldn't be
> >> wise to have several small rows. The major purpose of Hbase is to
> >> host very large tables that may go beyond billions of rows and millions
> of columns.
> >>
> >> Regards,
> >>     Mohammad Tariq
> >>
> >>
> >> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <[email protected]>
> wrote:
> >>> I need to support data that comes from 30+ sources and the structure
> >>> of that data is consistent across all the sources, but what I'm not
> >>> clear on is whether or not I should use 30+ tables with roughly the
> >>> same format or 1 table where the row key reflects the source.
> >>>
> >>> Anybody have a strong argument one way or the other?
> >>>
> >>> Thanks!
>

Re: more tables or more rows

Reply via email to