There are many considerations here, but one is that separate tables provide a 
completely separate namespace.
If you use one table design of the key space is more involved as you need to 
separate the namespace with key prefixes.


So if you never have to access data from separate "key space" in a single scan, 
then go for multiple tables.

On the other hand, one big table will probably distribute better over the 
regionserver and lead to fewer regions over all.

So it depends on how many tables you envision. 10 or 20 or even 100 or so it 
probably OK. 1000 tables or more will lead to very
many regions and hence overhead at the regionservers.



________________________________
 From: Mark <[email protected]>
To: [email protected] 
Sent: Sunday, November 20, 2011 9:54 AM
Subject: Re: Multiple tables vs big fat table
 
I'm more interested in how and why it would depend rather than the 
actual answer.

In evenly distributed systems you should do x/y because ..... If your 
data is not evenly distributed then you should...

Thanks


On 11/20/11 12:57 AM, Michel Segel wrote:
> Mark,
> Simple answer ... it depends... ;-)
>
> Longer answer...
> What's your use case? What's your access pattern? Is the type of data, in 
> this case evenly distributed in terms of size?
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Nov 18, 2011, at 3:29 PM, Mark<[email protected]>  wrote:
>
>> Is it better to have many smaller tables are one larger table? For example 
>> if we wanted to store user action logs we could do either of the following:
>>
>> Multiple tables:
>> - SearchLog
>> - PageViewLog
>> - LoginLog
>>
>> or
>>
>> One table:
>>   - ActionLog where the key could be a concatenation of the action type ie 
>>(search, pageview, login)
>>
>> Any ideas? Are there any performance considerations on having multiple 
>> smaller tables?
>>
>> Thanks
>>
>>

Reply via email to