[Rails] Re: Re: Re: Best practice around putting rails database info int

Marnen Laibow-Koser Thu, 22 Apr 2010 13:36:12 -0700

Rob Biedenharn wrote:
> On Apr 22, 2010, at 9:06 AM, Marnen Laibow-Koser wrote:
> 
>> Rob Biedenharn wrote:
>> [...]
>>> Matt, Pito, Marnen, and anyone else,
>>>
>>> 1. The opinion on whether db/schema.rb goes into the source  
>>> repository
>>> has changed over time.
>>
>> No.  I've used Rails since 1.2.6.  Every version has put a comment in
>> the schema.rb file that recommends putting it into version control.
> 
> I've been using Rails since 0.13 and I'll restate that this opinion
> has changed.
> I can't find it at the moment, but there WAS a version that
> specifically said in
> that comment to NOT put the file into source control. At least in
> 1.2.2, there
> is no comment either way.


I'll get out my old versions of Rails and look.  However, it's pretty 
much irrelevant, because look what current versions say:

# Note that this schema.rb definition is the authoritative source for 
your database schema. If you need
# to create the application database on another system, you should be 
using db:schema:load, not running
# all the migrations from scratch. The latter is a flawed and 
unsustainable approach (the more migrations
# you'll amass, the slower it'll run and the greater likelihood for 
issues).
#
# It's strongly recommended to check this file into your version control 
system.

That's pretty definitive.


[...]
>> But you should never be running old migrations in the first place.  If
>> you need version 1000 of the schema for a new installation, then don't
>> start at zero and run 1000 migrations -- just do rake db:schema:load  
>> and
>> have done with it.  This is the core team's recommendation, and I  
>> think
>> it's a good one.
> 
> I have helped other developers who have created several migrations in
> development, which were applied approximately when created, that were
> subsequently unable to run when deployed to production.  

And why was this so?

> The very same
> recommendations that guard against this kind of problem will make it
> possible to run those 1000 migrations (or any subsequence) without
> problem.

No.  There is a difference between running a days-old migration and a 
years-old migration.

> 
> 
>> merge the schema.rb files.  If not, get a better VCS.
> Yikes! No! The database itself holds the official version of the
> schema. If I merge changes from a master branch into my development
> branch, I will run any new migrations, but I certainly don't want some
> merge tool to give me a new schema.rb.  

For *changing* the schema on an existing installation, I agree with you 
-- run the migrations.  For a new installation, use rake db:schema:load. 
That implies that you need schema.rb in the VCS.  How much more clearly 
can I say this?

> Depending on the actual
> content of the migrations on different branches and the order in which
> they are run, the *actual* schema might be slightly different due to
> the rules for where new columns are placed on a table.

The order of columns on a DB table is immaterial.  You should be able to 
run rake db:schema:load for a new installation and immediately get a 
usable DB.

> 
> 
> 
>> You've
>> got it completely backwards.
> 
> You assume that I need to run db:schema:load, which *I* don't.

Why do you believe you don't?  Because you don't understand what it's 
for?  Because you never do new installations?

I'll submit that you *do* need to run db:schema:load, and you just don't 
know it because you don't understand what it's good for.

> 
> 
>>
>>> If you
>>> think about why the migration numbering (file naming) was changed  
>>> from
>>> sequence number to timestamp, you'll realize that the practice of  
>>> such
>>> "interleaved" migrations was a much bigger pain-point than what to do
>>> about db/schema.rb.
>>
>> I don't really understand what you're getting at here.
> 
> When migrations were sequentially numbered, two developers on separate
> branches might both create migration 005 for different purposes. This
> was a problem.  The chance that two developers both create migration
> 20100422105524 is acceptable small. 

OK, now I see what you mean.  Yes.

> Of course, they will probably be
> executed in a different order and if they both add a column to the
> same table, those columns will likewise be in a different order. 

So what?  Column order is immaterial.

> If db/
> schema.rb is in the repository, then lots of commits will have
> effectively meaningless changes 

How do you figure that?

> and unless the current HEAD has *my*
> version is it *not* going to truly represent what's in my database
> schema.

Sure it is -- modulo immaterial things like column order.

> 
>>
>>>
>>> 4. Unless you're initializing (inserting) data via migration, running
>>> all the migrations is really not much different than doing a
>>> db:schema:load because all the migrations are operating on empty
>>> tables.
>>
>> How can you say this with a straight face?  There is no reason *at  
>> all*
>> to run lots of migrations rather than doing a simple schema load.
> 
> The reason is deploying to an existing production database.  You can't
> do a schema load.  

Right.  That's what migrations are for.  That's the *only* thing 
migrations are for -- changing an existing database.

> The proper way to apply the "new" migrations (which
> might be kinda old if the last production deploy wasn't so recent)

Then you've got bigger problems.  If you can't do lots of little 
deploys, then you will wind up in integration hell anyway, and 
migrations will be the least of your worries.

> is
> a db:migrate. As I've stated, you can get into trouble with mismatches
> between migrations (which don't change after being created and
> *shouldn't* if created properly) and models (which obviously change
> over time).

If you're deploying frequently enough, there will be no issue.

> 
> I have projects with 210 and 204 migrations as well as many with
> fewer. Some of the migrations deal with rather nasty data
> manipulations to maintain data relationships when the associations are
> flipped around. It's not something that I would recommend, but the
> definition of "reasonable" can change dramatically when a client
> shifts the way he thinks about the data and its evolution.
> 
> As an experiment, I set up a new environment for the 204 migration
> project and ran the migrations from scratch. It takes about 8
> minutes.  

That's a long time, but that's the least of the problems.

> There is about 1 minute of startup time, there are several
> migrations that load some data including one that takes a bit over 3
> minutes to put a few tens of thousands of research datapoints into a
> set of tables.  

Wow, it just gets worse. :)  Seed data never, ever belongs in migrations 
-- partly because it makes schema loading impossible, and partly because 
migrations are only about the schema, not the data.  Use seed-fu or 
Rails 2.3's built-in seeding.

> I'm OK with that amount of time for something as
> significant as creating a new environment.

Yes, it's a one-time task.  But it's still wrong. :)

> 
> 
>>
>>> (Besides, if you have to "scale up your app", you probably
>>> aren't adding a new empty database, but creating a master-slave or
>>> sharding for performance.)
>>
>> Another red herring.
> 
> Well, if you can "scale up your app" by starting from an empty schema,
> go ahead. Perhaps initializing the shards, but then I'd start with the
> db/schema.rb from production, not something from the repository which
> almost certainly reflects a development environment however close to
> production that might be.

Still a red herring.  I want new installations to be easy to create -- 
not for scalability, which would certainly use DB replication, but for 
people actually creating a new, independent instance of the app -- say, 
a test environment, or simply someone else installing a Rails app that's 
meant for external use.

> 
>>
>> Rob, I know you know a lot about the Rails framework, but your advice
>> here will make dealing with databases far more difficult than it needs
>> to be.
> 
> If I help someone who recalls some of these nuggets of my wisdom and
> experience at a time where migrations give them trouble, then I will
> have made a positive difference.
> 
> However, it is that same experience that has led me to the conclusion
> that keeping db/schema.rb in the source repository is wrong.  It is
> derived data and I would no more put it into the repository than I
> would have someone put their object files compiled from C or their
> class files compiled from Java in there.

schema.rb is certainly derived data to a point. However, it is also the 
only reliable source of information about what the DB schema should look 
like at a given point in history.  I probably would not have thought to 
put it into version control had I not seen that note from the core team, 
but having done so and thought about it, I believe that it must go into 
version control.  It contains information that cannot be reliably 
derived except by going through the process that it was designed to 
circumvent.

To continue the compilation analogy: you might not put your binaries 
into the repository with the source code -- but you'd surely want to 
make them available for people who didn't care to build from source.

There are many good reasons to have schema.rb in version control.  There 
are few if any to do otherwise.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
[email protected]
-- 
Posted via http://www.ruby-forum.com/.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

[Rails] Re: Re: Re: Best practice around putting rails database info int

Reply via email to