Robert Walker wrote:
> Mike P. wrote:
>> Hello,
>> 
>> I've decided to use multiple tables for an entity (e.g. "todo_items1,"
>> "todo_items2," "todo_items3," etc.), instead of just one main table
>> which could end up having a lot of rows (e.g. just "todo_items"). I'm
>> doing this to try and to avoid a potential future performance drop that
>> could come with having too many rows in one table.
> 
> This sounds to me like the very definition of "Premature Optimization."
> 
> http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize
> 
> What you're describing is called "Database Sharding."
> 
> http://en.wikipedia.org/wiki/Shard_(database_architecture)
> 
> However, I would very highly recommend the simpler design, unless you 
> are having scaling problems right now. It sounds like you're at the 
> beginning stages of design so I doubt that is the case.
> 
> Putting optimization before design leads to complex, difficult to 
> maintain systems. At present you have no verifiable metrics to determine 
> whether the more complex design will ever be needed. Why pay the cost 
> for it now? Wouldn't it be smarter to implement optimizations after you 
> have solid metrics to prove the need for said optimizations?

Hi Robert,

Thank you for your response. I totally agree that not everything needs 
to be optimized upfront, but there's also a problem with waiting for the 
metrics: When I have those metrics that prove the need for these 
optimizations, the performance will already be getting affected. And 
depending on how closely the performance is being watched, it could 
already be something that's annoying to the user, and that's not good.

That, and I think it's easier, in this case, to have this built-in from 
the start, rather than having to add this in later on which would 
(probably) cause me to have to shut down the database when it comes time 
to do the actual upgrade, which would be another inconvenience for the 
user. I'd also like to avoid having to move the pre-existing data around 
for optimization, which I know I'd end up doing if I postpone this.

Creating the multiple tables now, when there's no real data, takes care 
of those issues and helps me get the infrastructure in place and 
well-tested.

Also, splitting up the tables upfront will help to delay the need for 
some future optimization. since index performance starts to drop once 
the tables get too large (e.g. around one million records or so). If I 
make five tables then, in theory, that index performance decrease 
shouldn't happen until those tables reach 1 million+ records, which will 
take longer since the data is being spread across the tables.

As for sharding, thank you for the advice, but it seems like an even 
more complicated solution than just using a few additional tables. When 
looking up sharding, I found a number people that recommend against it 
unless it's absolutely necessary. So, yes, sharding would be a good 
thing to wait for metrics before implementing it.

The links I found for the cons of sharding are from Updates 2, 3 and 4 
at the following site:

http://highscalability.com/unorthodox-approach-database-design-coming-shard

So, I'm still thinking that I want to split up the tables upfront. I do 
agree that "premature optimization" isn't always a good thing, but it's 
not always a bad thing either. :)

(Also, on another note, I'm using PostgreSQL and I only have the one 
database server, so moving a table/database off to another server isn't 
an option right now. [And I know that the shards can be on the same 
server, but it doesn't look like something I want to implement right 
now.])

So, any advice on a good Ruby/Rails way to handle this?

Thanks,
Mike

-- 
Posted via http://www.ruby-forum.com/.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to