Hi Kushagra,

On the whole, I think this proposal is looking fairly good. You're high-level 
explanation of the problem is solid, and you've given enough detail of the 
direction you intend to take the project that it gives me some confidence that 
you understand what you're proposing to do.

I have a couple of small concerns:

 * You aren't ever going to eat your own dogfood. You're spending the GSoC 
building an API that is intended for use with schema migration, but you're 
explicitly not looking at any part of the migration process that would actually 
use that API. How will we know that the API you build is actually fit for the 
purpose it is intended? How do we know that the requirements of "step 2" of 
schema migration will be met by your API? I'd almost prefer to see more depth, 
and less breadth -- i.e., show me a fully functioning schema migration stack on 
just one database, rather than a fully functioning API on all databases that 
hasn't actually been shown to work in practice.

 * It feels like there's a lot of padding in your schedule. 

   - A week of discussion at the start
   - 2 weeks for a "base" migration API
   - 2.5 weeks to write documentation
   - 2 "buffer" weeks 

Your project is proposing the development of a low level database API. While 
this should certainly be documented, if it's not going to be "user facing", the 
documentation requirements aren't as high. Also, because it's a low level 
database API, I'm not sure what common tools will exist -- yet your schedule 
estimates 1/6 of your overall time, and 1/3 of your active coding time, will be 
spent building these common tools. Having 1/6 of your project schedule as 
contingency is very generous; and you don't mention what you plan to look at if 
you don't have to use that contingency.

 * Your references to testing are a bit casual for my taste. From my 
experience, testing schema migration code is hard. Normal view code and 
utilities are easy to test -- you set up a test database, insert some data, and 
check functionality. However, schema migration code is explicitly about making 
database changes, so the thing that Django normally considers "static" -- the 
database models -- are subject to change, and that isn't always an easy thing 
to accommodate. I'd be interested to see your thoughts on how you plan to test 
your API.

 * Your proposal doesn't make any reference to the existing "migration-like" 
tasks in Django's codebase. For example, we already have code for creating 
tables and adding indicies. How will your migration code use, modify or augment 
these existing capabilities?

Yours,
Russ Magee %-)

On 01/04/2012, at 5:02 PM, j4nu5 wrote:

> Less than a week remains for student application deadline. Can someone please 
> comment on the above revised proposal. Thanks a lot.
> 
> On Monday, 26 March 2012 01:29:35 UTC+5:30, j4nu5 wrote:
> Here is a revised proposal.
> 
> Abstract
> ------------------------------------------------------------------------------
> A database migration helper has been one of the most long standing feature
> requests in Django. Though Django has an excellent database creation helper,
> when faced with schema design changes, developers have to resort to either
> writing raw SQL and manually performing the migrations, or using third party
> apps like South[1] and Nashvegas[2].
> 
> [1] http://south.aeracode.org/
> [2] https://github.com/paltman/nashvegas/
> 
> Clearly Django will benefit from having a database migration helper as an
> integral part of its codebase.
> 
> From the summary on django-developers mailing list[3], the task of building a
> migrations framework will involve:
> 1. Add a db.backends module to provide an abstract interface to migration
>    primitives (add column, add index, rename column, rename table, and so on).
> 2. Add a contrib app that performs the high level accounting of "has migration
>    X been applied", and management commands to "apply all outstanding
>    migrations"
> 3. Provide an API that allows end users to define raw-SQL migrations, or
>    native Python migrations using the backend primitives.
> 4. Leave the hard task of determining dependencies, introspection of database
>    models and so on to the toolset contributed by the broader community.
> 
> [3] http://groups.google.com/​group/django-developers/msg/​cf379a4f353a37f8
> 
> I would like to work on the 1st step as part of this year's GSoC.
> 
> 
> Implementation plan
> ------------------------------------------------------------------------------
> The idea is to have a CRUD interface to database schema (with some additional
> utility functions for indexing etc.) with functions like:
> * create_table
> * rename_table
> * delete_table
> * add_column
> and so on, which will have the *explicit* names of the table/column to be
> modified as its parameter. It will be the responsibility of the higher level
> API caller (will not be undertaken as part of GSoC) to translate model/field
> names to explicit table/column names. These functions will be directly
> responsible for modifying the schema, and any interaction with the database
> schema will take place by calling these functions. Most of these functions
> will come from South.
> 
> These API functions will also have a "dry-run" or test mode, in which they
> will output raw SQL representation of the migration or display errors if they
> occur. This will be useful in:
> 1. The MySQL backend. MySQL does not have transaction support for schema
>    modification and hence the migrations will be run in a dry run mode first
>    so that any errors can be captured before altering the schema.
> 2. The django-admin commands sql and sqlall that return the SQL (for creation
>    and indexing) for an app. They will capture the SQL returned from the API
>    running in dry run mode.
> 
> As for the future of the current Django creation API, it will have to be
> refactored (not under GSoC) to make use of the 'create' part of our new CRUD
> interface, for consistency purposes.
> 
> The GeoDjango backends will also have to be refactored to use the new API.
> Since, they build upon the base code in db.backends, firstly db.backends will
> have to be refactored.
> 
> Last year xtrqt had written, documented and tested code for at least the
> SQLite backend[4]. As per Andrew's suggestion, I would not be relying too much
> on that code but some parts can still be salvaged.
> 
> [4] 
> https://groups.google.com/​forum/?fromgroups#!searchin/​django-developers/xtrqt/​django-developers/pSICNJBJRy8/​Hl7frp-O-dMJ
> 
> 
> Schedule and Goal
> ------------------------------------------------------------------------------
> Week 1     : Discussion on API design and writing tests
> Week 2-3   : Developing the base migration API
> Week 4     : Developing extensions and overrides for PostgreSQL
> Week 5-6   : Developing extensions and overrides for MySQL
> Week 7-8.5 : Developing extensions and overrides for SQLite (may be shorter or
>              longer (by 0.5 week) depending on how much of xtrqt's code is
>              considered acceptable)
> Week 8.5-10: Writing documentaion and leftover regression tests, if any
> Week 11-12 : Buffer weeks for the unexpected
> 
> I will consider my project to be successful when we have working, tested and
> documented migration primitives for Postgres, MySQL and SQLite. If we can
> develop a working fork of South to use these primitives, that will be a strong
> indicator of the project's success.
> 
> 
> About me and my inspiration for the project
> ------------------------------------------------------------------------------
> I am Kushagra Sinha, a pre-final year student at Institute of Technology
> (about to be converted to an Indian Institute of Technology),
> Banaras Hindu University, Varanasi, India.
> 
> I can be reached at:
> Gmail: sinha.kushagra
> Alternative email: kush [at] j4nu5 [dot] com
> IRC: Nick j4nu5 on #django-dev and #django
> Twitter: @j4nu5
> github: j4nu5
> 
> I was happily using PHP for nearly all of my webdev work since my high school
> days (CakePHP being my framework of choice) till I was introduced to Django
> a year and a half ago. Comparing Django with CakePHP (which is Ruby on Rails
> inspired) I felt more attached to Django's philosophy than RoR's "hidden 
> magic"
> approach. I have been in love ever since :)
> 
> Last year I had an internship at MobStac[5] (BusinessWorld magazine India's
> hottest young startup[6]). Their stack is on Django+MySQL. I was involved in
> a heavy database migration that involved their analytics platform. Since, they
> had not been using a migrations framework, the situation looked grim.
> Fortunately, South came to the rescue and we were able to carry out the
> migration but it left everyone a little frustrated and clearly in want of a
> migrations framework built within Django itself.
> 
> [5] http://mobstac.com/
> [6] 
> http://blog.mobstac.com/blog/2011/06/businessworld-declares-mobstac-indias-hottest-young-startup/
> 
> 
> Experience
> ------------------------------------------------------------------------------
> I have experience working in a high voltage database migration through my
> internship as stated before. I am also familiar with Django's contribution
> guidelines and have written a couple of patches[7]. One patch has been
> accepted and the second got blocked by 1.4's feature freeze.
> My other projects can be seen on my github[8]
> 
> [7] https://code.djangoproject.com/query?owner=~j4nu5
> [8] https://github.com/j4nu5
> 
> On Mon, Mar 19, 2012 at 5:03 AM, Russell Keith-Magee 
> <russ...@keith-magee.com> wrote:
> 
> On 18/03/2012, at 7:38 PM, Kushagra Sinha wrote:
> 
> > Abstract
> > ------------------------------------------------------------------------------
> > A database migration helper has been one of the most long standing feature
> > requests in Django. Though Django has an excellent database creation helper,
> > when faced with schema design changes, developers have to resort to either
> > writing raw SQL and manually performing the migrations, or using third party
> > apps like South[1] and Nashvegas[2].
> >
> > Clearly Django will benefit from having a database migration helper as an
> > integral part of its codebase.
> >
> > From [3], the consensus seems to be on building a Ruby on Rails ActiveRecord
> > Migrations[4] like framework, which will essentially emit python code after
> > inspecting user models and current state of the database.
> 
> Check the edit dates on that wiki -- most of the content on that page is 
> historical, reflecting discussions that were happening over 3 years ago. 
> There have been many more recent discussions.
> 
> The "current consensus" (at least, the consensus of what the core team is 
> likely to accept) is better reflected by the GSoC project that was accepted, 
> but not completed last year. I posted to Django-developers about this a week 
> or so ago [1]; there were some follow up conversations in that thread, too 
> [2].
> 
> [1] http://groups.google.com/group/django-developers/msg/cf379a4f353a37f8
> [2] http://groups.google.com/group/django-developers/msg/2f287e5e3dc9f459
> 
> > The python code
> > generated will then be fed to a 'migrations API' that will actually handle 
> > the
> > task of migration. This is the approach followed by South (as opposed to
> > Nashvegas's approach of generating raw SQL migration files). This ensures
> > modularity, one of the trademarks of Django.
> 
> I don't think you're going to be able to ignore raw SQL migrations quite that 
> easily. Just like the ORM isn't able to express every query, there will be 
> migrations that you can't express in any schema migration abstraction. Raw 
> SQL migrations will always need to be an option (even if they're feature 
> limited).
> 
> > Third party developers can create
> > their own inspection and ORM versioning tools, provided the inspection tool
> > emits python code conforming to our new migrations API.
> >
> > To sum up, the complete migrations framework will need, at the highest 
> > level:
> > 1. A migrations API that accepts python code and actually performs the
> >    migrations.
> 
> This is certainly needed. I'm a little concerned by your phrasing of an "API 
> that accepts python code", though. An API is something that Python code can 
> invoke, not the other way around. We're looking for 
> django.db.backends.migration as an analog of django.db.backends.creation, not 
> a code consuming utility library.
> 
> > 2. An inspection tool that generates the appropriate python code after
> >    inspecting models and current state of database.
> 
> The current consensus is that this shouldn't be Django's domain -- at least, 
> not in the first instance. It might be appropriate to expose an API to 
> extract the current model state in a Pythonic form, but a fully-fledged, user 
> accessible "tool".
> 
> > 3. A versioning tool to keep track of migrations. This will allow 'backward'
> >    migrations.
> 
> If backward migrations is the only reason to have a versioning tool, then I'd 
> argue you don't need versioning.
> 
> However, that's not the only reason to have versioning, is it :-)
> 
> > South's syncdb:
> > class Command(NoArgsCommand):
> >     def handle_noargs(self, migrate_all=False, **options):
> 
> As a guide for the future -- large wads of code like this aren't very 
> compelling as part of a proposal unless you're trying to demonstrate 
> something specific. In this case, you're just duplicating some of South's 
> internals -- "I'm going to take South's lead" is all you really needed to say.
> 
> > If migrations become a core part of Django, every user app will have a
> > migration folder(module) under it, created at the time of issuing
> > django-admin.py startapp. Thus by modifying the startapp command to create a
> > migrations module for every app it creates, we will be able to use South's
> > syncdb code as is and will also save the user from issuing
> > schemamigration --initial for all his/her apps.
> >
> > Now that we have a guaranteed migrations history for every user app, migrate
> > command will also be more or less a copy of South's migrate command.
> 
> What does this "history" look like? Are migrations named? Are they dated? 
> Numbered? How do you handle dependencies? Ordering? Collisions between 
> parallel development?
> 
> *This* is the sort of thing a proposal should be elaborating.
> >
> > As much as I would have liked to use Django creation API's code for creating
> > and destroying models, we cannot. The reason for this is Django's creation 
> > API
> > uses its inspection tools to generate *SQL* which is then directly fed to
> > cursor.execute. What we need is a migrations API which gobbles up *python*
> > code generated by the inspection tool. Moreover deprecating/removing 
> > Django's
> > creation API to use the new migrations API everywhere will give rise to
> > performance issues since time will be wasted in generating python code and 
> > then
> > converting python to SQL for Django's core apps which will never have
> > migrations anyways.
> 
> This sounds like a false economy to me. If we're talking about the core 
> pipeline for handling a HTTP request, then every method call and abstraction 
> counts. However, that's not what we're talking about. We're talking about 
> utilities used to synchronize the database. They're called by manual 
> invocation, infrequently, and *never* as part of the request/response cycle.
> 
> Yes, there will probably be a slowdown -- but we get the benefit of a 
> consistent interface to database creation. However, unless the slowdown to 
> syncdb is such that it becomes *seriously* observable -- e.g., turns sycndb 
> into a 1 minute operation, rather than a 1 second operation -- then you're 
> advocating for duplicating code paths in order to maintain a false economy.
> 
> > The creation API and code that depends on it (syncdb, sql, 
> > django.test.simple
> > and django.contrib.gis.db.backends) will be left as is.
> >
> > Therefore much of the code for our new migrations API will come from South.
> 
> Again, the code snippet highlights nothing here. Anyone qualified to review 
> your proposal is at least familiar with South, so there's no need to give a 
> page long example of South's usage unless you're trying to say something 
> specific about South's API and usage.
> 
> > Schedule and Goal
> > ------------------------------------------------------------------------------
> > Week 1    : Discussion on API design and overriding django-admin startapp
> > Week 2-3  : Developing the base migration API
> > Week 4    : Developing migration extensions and overrides for PostgreSQL
> > Week 5    : Developing migration extensions and overrides for MySQL
> > Week 6    : Developing migration extensions and overrides for SQLite
> > Week 7    : Developing the inspection tools
> > Week 8    : Developing the ORM versioning tools and glue code
> > Week 9-10 : Writing tests/documentaion
> > Week 11-12: Buffer weeks for the unexpected, Oracle DB? and
> >             djago.contrib.gis.backends?
> >
> 
> Week 13 - profit.
> 
> Seriously, this is a very unconvincing timetable. What are you basing these 
> estimates on?
> 
> Some of the things that raise flags for me:
> 
>  * What makes you think that MySQL, PostgreSQL and SQLite are all equally 
> complex when it comes to migrations? SQLite doesn't let you rename a table. 
> Tracking MySQL index changes is non-trivial.
> 
> * On what basis do you assert that "developing inspection tools" -- 
> presumably for all three databases covered in weeks 4-6 -- will take 1 week?
> 
>  * If you're not working on tests until week 9-10, how do you plan to 
> establish that the work you do in week 1 actually works?
> 
> > Note: Work on Oracle and GIS may not be possible as part of GSoC
> >
> > I will personally consider my project to be successful if I have created and
> > tested at least the base API + PostgreSQL extension and inspection + version
> > tools.
> 
> If that's the case, then why does your schedule say you're going to complete 
> MySQL and SQLite, and possibly Oracle as well?
> 
> I can see that you're obviously enthused by this project, but as it stands, I 
> can't say this is a very compelling proposal.
> 
>  * It ignores the most recent activity in the area (last year's GSoC, in 
> particular)
> 
>  * It is extremely light in detail on how some very big details (like your 
> "versioning tools" will work)
> 
>  * The proposed schedule reads more like a list of things you know you need 
> to do, not a detailed work breakdown backed by realistic estimates.
> 
> Thanks for taking the time to submit this proposal. I'd encourage you to have 
> a second swing at this. Read the recent discussions on the topic; take a look 
> at last year's GSoC proposal; and spend some time elaborating on the details 
> that I've highlighted.
> 
> Yours,
> Russ Magee %-)
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/django-developers/-/nfJvnjObKKsJ.
> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to