Thanks for the info.
On 11/20/11 11:30 AM, lars hofhansl wrote:
There are many considerations here, but one is that separate tables provide a
completely separate namespace.
If you use one table design of the key space is more involved as you need to
separate the namespace with key prefixes.
So if you never have to access data from separate "key space" in a single scan,
then go for multiple tables.
On the other hand, one big table will probably distribute better over the
regionserver and lead to fewer regions over all.
So it depends on how many tables you envision. 10 or 20 or even 100 or so it
probably OK. 1000 tables or more will lead to very
many regions and hence overhead at the regionservers.
________________________________
From: Mark<[email protected]>
To: [email protected]
Sent: Sunday, November 20, 2011 9:54 AM
Subject: Re: Multiple tables vs big fat table
I'm more interested in how and why it would depend rather than the
actual answer.
In evenly distributed systems you should do x/y because ..... If your
data is not evenly distributed then you should...
Thanks
On 11/20/11 12:57 AM, Michel Segel wrote:
Mark,
Simple answer ... it depends... ;-)
Longer answer...
What's your use case? What's your access pattern? Is the type of data, in this
case evenly distributed in terms of size?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 18, 2011, at 3:29 PM, Mark<[email protected]> wrote:
Is it better to have many smaller tables are one larger table? For example if
we wanted to store user action logs we could do either of the following:
Multiple tables:
- SearchLog
- PageViewLog
- LoginLog
or
One table:
- ActionLog where the key could be a concatenation of the action type ie
(search, pageview, login)
Any ideas? Are there any performance considerations on having multiple smaller
tables?
Thanks