There are many considerations here, but one is that separate tables provide a completely separate namespace. If you use one table design of the key space is more involved as you need to separate the namespace with key prefixes.
So if you never have to access data from separate "key space" in a single scan, then go for multiple tables. On the other hand, one big table will probably distribute better over the regionserver and lead to fewer regions over all. So it depends on how many tables you envision. 10 or 20 or even 100 or so it probably OK. 1000 tables or more will lead to very many regions and hence overhead at the regionservers. ________________________________ From: Mark <[email protected]> To: [email protected] Sent: Sunday, November 20, 2011 9:54 AM Subject: Re: Multiple tables vs big fat table I'm more interested in how and why it would depend rather than the actual answer. In evenly distributed systems you should do x/y because ..... If your data is not evenly distributed then you should... Thanks On 11/20/11 12:57 AM, Michel Segel wrote: > Mark, > Simple answer ... it depends... ;-) > > Longer answer... > What's your use case? What's your access pattern? Is the type of data, in > this case evenly distributed in terms of size? > > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Nov 18, 2011, at 3:29 PM, Mark<[email protected]> wrote: > >> Is it better to have many smaller tables are one larger table? For example >> if we wanted to store user action logs we could do either of the following: >> >> Multiple tables: >> - SearchLog >> - PageViewLog >> - LoginLog >> >> or >> >> One table: >> - ActionLog where the key could be a concatenation of the action type ie >>(search, pageview, login) >> >> Any ideas? Are there any performance considerations on having multiple >> smaller tables? >> >> Thanks >> >>
