[Rails-core] some (long) thoughts on migrations

Rick Bradley Sun, 19 Feb 2006 18:13:08 -0800

First, long-winded thought #1:

Today I was playing with a Rails app (Typo) and wanted to add a
migration to support a new column in one of the data tables -- purely
for some functionality that's almost certainly only of interest to me in
my little Weblogging Kingdom.  I immediately realized that if I added
migration #37, with whatever name, that Tobi is likely to come along
next week and add his own migration #37, which is going to conflict with
mine.


In a way, the "total order" that the increasing sequence of migration
numbers defines ends up working against collaboration.

The first instant and egregiously bad idea that occurred to me was that
maybe we should put some space between numbers (like in BASIC "10 print
"hello" "20 goto 10").  No need to reminisce about BASIC.

I thought for a second about it and decided that what the linear
migration numbering is trying to do is resolve dependencies.  With one
author (or a tightly coordinated team) a total order is fine, and the
easiest total order is one that could be put in the filenames of the
migrations themselves, viz., the natural numbers we have now.

When the development isn't so controlled (or, in my case and certainly
others, when local installations wish to add a fork for custom
functionality) the total order imposes integration problems, and using
the natural numbers to implement the order makes things worse (i.e., I
can guarantee Tobi's going to stomp on any custom migrations I put in).
In such situations it would be nice if we could use a partial order
instead.

I thought about this for a second and realized that we're already making
use of partial orders in this way in the project:  our Rake tasks
(default and in lib/).

Am I suggesting that we use rake tasks for migrations?  No.  Just that
Rake is a well-thought out example for how me might use Ruby (which
we're already using in Migrations) to handle dependent tasks.  Perhaps a
DSL for specifying migration dependencies.

Anyway, I started brainstorming that if someone wanted to add a custom
migration that depended upon migration #36 they'd write their migration,
not use a number for the naming of it, and at the top of the file use a
DSL statement similar to a Rake dependency rule to say "hey, I'm
dependent on this other migration".

Then I realized that the schema_info table and associated logic rely on
the natural-number total ordering for their succinct assessment of
whether anything needs to be done on a 'rake migrate'.  Things begin to
get complicated (the obvious solution being to go from 1 row in
schema_info to 1 row per migration file) and I'm not clear that it's
worth pursuing.  I.e., is the problem just an annoyance for me or does
it affect others, do they care, are there other solutions, etc.?

So, I figured I'd pass this along in case this was of interest to
someone else.


Then I realized I had another thing to go on about (hopefully
not-so-longwinded thought #2):

I've got a chunk of Rails code that was written back around Rails 0.0.0 or so
that I want to upgrade.  No big deal, really, that's just file manipulations.
Having used this thing (it's the so-called "Accountomatic") for years, starting
back in PHP-land, and migrating to Rails, etc., I've come to realize that the
data model needs certain specific changes.  So the tack I'm going to take is to
put the thing onto Migrations and then migrate the data step by step to the new
model.  Cool.

Actually, I've already started this process, and I've run into a headache.  I
think /I/ have an out, but it points to a bigger issue.  I think this is going
to be a headache for a number of people from time to time so it's probably
worth covering.

I've got a number of accounting models (Account, Person, Tran (transaction),
Budget, Period, etc.) in the system, all with data in a live database.  I need
to refactor the data model so that instead of simply having The Simplest Thing
That Could Possibly Work Circa 1999, I've instead got a reasonable data model.
So a Tran object which stores a "transaction" (which should be rightly called
an "entry" since it's not transactional in any real sense), meaning an amount,
a time, a Person, and an Account will be turned into two Entry objects which
are linked to a Transaction object -- the goal being to go to a double-entry
bookkeeping system with multi-legged transactions, memo accounts, etc.

So... I begin writing the migrations one by one to start transforming the data
I've got into the data I want.  Something comes up, though.

If I'm going to add models (Entry, Transaction, etc.) to replace old models
(Tran in this case, but there are other things happening to Account) then I
should presumably:

 (1) add the new model classes so I can do 

 def self.up
   # ...
   Trans.find(:all).each do |t|
     Entry.create ...
     Transaction.create ...
   end
 end

 (2) get rid of the old model classes

But, here's the problem:  we make these migrations so that we can upgrade
systems that can be anywhere along the upgrade path.  Let's say at revision
#200 I add the migrations to convert over Tran -> Entry + Transaction.  Then at
revision #201 I go ahead and get rid of the Tran model class, since it's no
longer needed.  Then I go on for a few more revisions doing coding, adding
migrations, etc.

Now, a month later some user comes along and decides to upgrade his install.
He's at revision #199, which was the stable revision for months (true in this
case, modulo exact numbering).  He does a pull and gets himself up to current
at revision #215.  He runs 'rake migrate'.  

Boom.

What happened?

Well, the first migration says "Trans.find", but Trans is gone, so that's not
going to fly.

That's really sort of the simple case -- imagine if, instead of doing model
replacement I were adding a 'before_save' to a model, and that before_save was
doing some work on a column (for instance you might have a 'body_html' column
to store rendered content, or a cache of some type...).  Well, that column is
going to be added by a migration at some point (basically, concurrent with the
addition of the before_save).  But, if you're back on an old revision, do an
svn pull, and then run rake migrate, if any of the earlier migrations do things
to the model with the new before_save on it...  Boom.  That column's not there
yet, but the before_save is already in the code, and it needs that column.

(I'm suspicious that Typo has this particular problem, fwiw, depending on when
the end user does his/her pulls.)

If the end user insteaad pulls down every SVN revision one by one (#200, #201,
#202, ...) and runs 'rake migrate' each time then s/he should probably be safe,
otherwise the trunk code can get out of sync with the earlier migrations,
causing problems.

One way the developer can deal with this (what I've been muddling through in a
tough spot in my case, in fact) is to do things like:

  def self.up
    # ...
    ActiveRecord::Base.connection.select_all(...).each do |foo|
      execute "insert into bar (...) values (...)"
    end

Blech.

But, this doesn't catch everything (the before_save example is unhelped, e.g.).
Ultimately, the developer can't (and shouldn't have to) predict what the future
is going to bring, and shouldn't have to code around this sort of problem.

Best,
Rick
-- 
 http://www.rickbradley.com    MUPRN: 64
                       |  that they collect the
   random email haiku  |  sales tax but don't ever
                       |  pay it to our state.
_______________________________________________
Rails-core mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails-core

[Rails-core] some (long) thoughts on migrations

Reply via email to