Well those regions are still distributed so, depending on the amount
of data you generate per day, you may have only 1 region per day but
even if you were using a single table and storing those rows next to
each other then the access pattern would stay the same no?

J-D

On Wed, Mar 9, 2011 at 3:37 PM, Peter Haidinyak <[email protected]> wrote:
> I do that now but if they were in different table I could thread that out 
> with one thread per table. I'm just worried I lose the advantage of HBase and 
> a distributed system if the table ends up on one region server.
>
> -Pete
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel 
> Cryans
> Sent: Wednesday, March 09, 2011 3:14 PM
> To: [email protected]
> Subject: Re: Many smaller tables vs one large table
>
> I guess it could be a good idea... do you need to be able to scan for
> data that's contained in more than one day?
>
> J-D
>
> On Wed, Mar 9, 2011 at 2:08 PM, Peter Haidinyak <[email protected]> wrote:
>> Hi all,
>>    Right now I am aggregating our log data and populating tables based on 
>> how we want to query the data later. Currently I have eleven different 
>> aggregation tables and the date is part of the Row key. Since we usually 
>> slice our data by day I was wondering if it would be better to create 
>> aggregation table by date. I would no longer have to use the date as part of 
>> the stop/end row keys in a scan and it would be easier to prune old data. I 
>> would also guess there would be less contention on tables between the 
>> process that populates the table and the processes that query the table. One 
>> of the only problems I see, with my limited knowledge about HBase, is the 
>> tables will end up being rather small and would most likely end up on one 
>> region server.
>>        Long story short, is this a good idea?
>>
>> Thanks
>>
>> -Pete
>>
>

Reply via email to