Re: Multiple tables vs big fat table

Mark Sun, 20 Nov 2011 17:18:57 -0800

Thanks for the info.

On 11/20/11 11:30 AM, lars hofhansl wrote:

There are many considerations here, but one is that separate tables provide a 
completely separate namespace.
If you use one table design of the key space is more involved as you need to 
separate the namespace with key prefixes.



So if you never have to access data from separate "key space" in a single scan, 
then go for multiple tables.

On the other hand, one big table will probably distribute better over the 
regionserver and lead to fewer regions over all.

So it depends on how many tables you envision. 10 or 20 or even 100 or so it 
probably OK. 1000 tables or more will lead to very
many regions and hence overhead at the regionservers.



________________________________
  From: Mark<[email protected]>
To: [email protected]
Sent: Sunday, November 20, 2011 9:54 AM
Subject: Re: Multiple tables vs big fat table

I'm more interested in how and why it would depend rather than the
actual answer.

In evenly distributed systems you should do x/y because ..... If your
data is not evenly distributed then you should...

Thanks


On 11/20/11 12:57 AM, Michel Segel wrote:

Mark,
Simple answer ... it depends... ;-)

Longer answer...
What's your use case? What's your access pattern? Is the type of data, in this 
case evenly distributed in terms of size?



Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 18, 2011, at 3:29 PM, Mark<[email protected]>   wrote:

Is it better to have many smaller tables are one larger table? For example if 
we wanted to store user action logs we could do either of the following:

Multiple tables:
- SearchLog
- PageViewLog
- LoginLog

or

One table:
    - ActionLog where the key could be a concatenation of the action type ie 
(search, pageview, login)

Any ideas? Are there any performance considerations on having multiple smaller 
tables?

Thanks

Re: Multiple tables vs big fat table

Reply via email to