Hi MK, Some suggestions here
http://www.hbasecon.com/sessions/lightning-talk-real-performance-gains-with-real-time-data/ ./zahoor On Wed, Aug 8, 2012 at 5:44 PM, M. Karthikeyan <[email protected]>wrote: > Hi, > A slightly related question: > We have time series data continuously flowing into the system and has to > be stored in HBase. > We have retention policy to retain data for 90 days, so data older than 90 > days have to be deleted from HBase every midnight. > There are two (that we know) ways of doing this: > 1) Since bulk deletes could be costly and dropping an entire table is > easier, we could have day wise tables and drop entire table > 2) This post > http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603suggests > that we can have a single table and use the TTL feature for ageing > out data. > > May I request someone to briefly list out the pros and cons of either > options? > PS: We expect around 200 million records per day and each record would be > approx.. 500 bytes. > Thanks & Regards > MK > > > -----Original Message----- > From: Mohammad Tariq [mailto:[email protected]] > Sent: 08 August 2012 03:19 > To: [email protected] > Subject: Re: more tables or more rows > > Hello sir, > > It is absolutely fine to have as many tables as we like. My point was > that if we have a large no of tables then it might add some overhead in > locating the user region, as there will be a huge amount of mapping from > "user tables" to "region servers". Also, client will have to cache more > information blocking the additional memory. So, I suggested to have small > no of large tables rather than large no of small tables, if the data is > similar. > > Regards, > Mohammad Tariq > > > On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <[email protected]> wrote: > > Thanks Mohammad, > > > > By saying the major purpose is to host very large tables (implying a > > smaller number of them), are you referring to anything other than the > > memstores per column family taking up sizable portions of physical > memory? > > Are there other components or design aspects that make using large > > numbers of tables inadvisable? > > > > On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <[email protected]> > wrote: > >> Hello sir, > >> > >> Going for a single table with 30+ rows would be a better > >> choice, if the data from all the sources is not very different. > >> Since, you are considering Hbase as your data store, it wouldn't be > >> wise to have several small rows. The major purpose of Hbase is to > >> host very large tables that may go beyond billions of rows and millions > of columns. > >> > >> Regards, > >> Mohammad Tariq > >> > >> > >> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <[email protected]> > wrote: > >>> I need to support data that comes from 30+ sources and the structure > >>> of that data is consistent across all the sources, but what I'm not > >>> clear on is whether or not I should use 30+ tables with roughly the > >>> same format or 1 table where the row key reflects the source. > >>> > >>> Anybody have a strong argument one way or the other? > >>> > >>> Thanks! >
