Hi,
A slightly related question:
We have time series data continuously flowing into the system and has to be 
stored in HBase.
We have retention policy to retain data for 90 days, so data older than 90 days 
have to be deleted from HBase every midnight.
There are two (that we know) ways of doing this:
1) Since bulk deletes could be costly and dropping an entire table is easier, 
we could have day wise tables and drop entire table 
2) This post http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603 
suggests that we can have a single table and use the TTL feature for ageing out 
data.

May I request someone to briefly list out the pros and cons of either options?
PS: We expect around 200 million records per day and each record would be 
approx.. 500 bytes.
Thanks & Regards
MK


-----Original Message-----
From: Mohammad Tariq [mailto:[email protected]] 
Sent: 08 August 2012 03:19
To: [email protected]
Subject: Re: more tables or more rows

Hello sir,

    It is absolutely fine to have as many tables as we like. My point was that 
if we have a large no of tables then it might add some overhead in locating the 
user region, as there will be a huge amount of mapping from "user tables" to 
"region servers". Also, client will have to cache  more information blocking 
the additional memory. So, I suggested to have small no of large tables rather 
than large no of small tables, if the data is similar.

Regards,
    Mohammad Tariq


On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <[email protected]> wrote:
> Thanks Mohammad,
>
> By saying the major purpose is to host very large tables (implying a 
> smaller number of them), are you referring to anything other than the 
> memstores per column family taking up sizable portions of physical memory?
>  Are there other components or design aspects that make using large 
> numbers of tables inadvisable?
>
> On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <[email protected]> wrote:
>> Hello sir,
>>
>>       Going for a single table with 30+ rows would be a better 
>> choice, if the data from all the sources is not very different. 
>> Since, you are considering Hbase as your data store, it wouldn't be 
>> wise to have several small rows. The major purpose of Hbase is to 
>> host very large tables that may go beyond billions of rows and millions of 
>> columns.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <[email protected]> wrote:
>>> I need to support data that comes from 30+ sources and the structure 
>>> of that data is consistent across all the sources, but what I'm not 
>>> clear on is whether or not I should use 30+ tables with roughly the 
>>> same format or 1 table where the row key reflects the source.
>>>
>>> Anybody have a strong argument one way or the other?
>>>
>>> Thanks!

Reply via email to