Well those regions are still distributed so, depending on the amount of data you generate per day, you may have only 1 region per day but even if you were using a single table and storing those rows next to each other then the access pattern would stay the same no?
J-D On Wed, Mar 9, 2011 at 3:37 PM, Peter Haidinyak <[email protected]> wrote: > I do that now but if they were in different table I could thread that out > with one thread per table. I'm just worried I lose the advantage of HBase and > a distributed system if the table ends up on one region server. > > -Pete > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel > Cryans > Sent: Wednesday, March 09, 2011 3:14 PM > To: [email protected] > Subject: Re: Many smaller tables vs one large table > > I guess it could be a good idea... do you need to be able to scan for > data that's contained in more than one day? > > J-D > > On Wed, Mar 9, 2011 at 2:08 PM, Peter Haidinyak <[email protected]> wrote: >> Hi all, >> Right now I am aggregating our log data and populating tables based on >> how we want to query the data later. Currently I have eleven different >> aggregation tables and the date is part of the Row key. Since we usually >> slice our data by day I was wondering if it would be better to create >> aggregation table by date. I would no longer have to use the date as part of >> the stop/end row keys in a scan and it would be easier to prune old data. I >> would also guess there would be less contention on tables between the >> process that populates the table and the processes that query the table. One >> of the only problems I see, with my limited knowledge about HBase, is the >> tables will end up being rather small and would most likely end up on one >> region server. >> Long story short, is this a good idea? >> >> Thanks >> >> -Pete >> >
