Robert Walker wrote: > Mike P. wrote: >> Hello, >> >> I've decided to use multiple tables for an entity (e.g. "todo_items1," >> "todo_items2," "todo_items3," etc.), instead of just one main table >> which could end up having a lot of rows (e.g. just "todo_items"). I'm >> doing this to try and to avoid a potential future performance drop that >> could come with having too many rows in one table. > > This sounds to me like the very definition of "Premature Optimization." > > http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize > > What you're describing is called "Database Sharding." > > http://en.wikipedia.org/wiki/Shard_(database_architecture) > > However, I would very highly recommend the simpler design, unless you > are having scaling problems right now. It sounds like you're at the > beginning stages of design so I doubt that is the case. > > Putting optimization before design leads to complex, difficult to > maintain systems. At present you have no verifiable metrics to determine > whether the more complex design will ever be needed. Why pay the cost > for it now? Wouldn't it be smarter to implement optimizations after you > have solid metrics to prove the need for said optimizations?
Hi Robert, Thank you for your response. I totally agree that not everything needs to be optimized upfront, but there's also a problem with waiting for the metrics: When I have those metrics that prove the need for these optimizations, the performance will already be getting affected. And depending on how closely the performance is being watched, it could already be something that's annoying to the user, and that's not good. That, and I think it's easier, in this case, to have this built-in from the start, rather than having to add this in later on which would (probably) cause me to have to shut down the database when it comes time to do the actual upgrade, which would be another inconvenience for the user. I'd also like to avoid having to move the pre-existing data around for optimization, which I know I'd end up doing if I postpone this. Creating the multiple tables now, when there's no real data, takes care of those issues and helps me get the infrastructure in place and well-tested. Also, splitting up the tables upfront will help to delay the need for some future optimization. since index performance starts to drop once the tables get too large (e.g. around one million records or so). If I make five tables then, in theory, that index performance decrease shouldn't happen until those tables reach 1 million+ records, which will take longer since the data is being spread across the tables. As for sharding, thank you for the advice, but it seems like an even more complicated solution than just using a few additional tables. When looking up sharding, I found a number people that recommend against it unless it's absolutely necessary. So, yes, sharding would be a good thing to wait for metrics before implementing it. The links I found for the cons of sharding are from Updates 2, 3 and 4 at the following site: http://highscalability.com/unorthodox-approach-database-design-coming-shard So, I'm still thinking that I want to split up the tables upfront. I do agree that "premature optimization" isn't always a good thing, but it's not always a bad thing either. :) (Also, on another note, I'm using PostgreSQL and I only have the one database server, so moving a table/database off to another server isn't an option right now. [And I know that the shards can be on the same server, but it doesn't look like something I want to implement right now.]) So, any advice on a good Ruby/Rails way to handle this? Thanks, Mike -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

