Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)

2016-09-21 Thread Romain Garrigues
Any feedback about my 2 proposals, to know if it is worth to spend time to 
propose a patch for one of those?

Le mardi 5 juillet 2016 18:54:27 UTC+1, Romain Garrigues a écrit :
>
>
> Markus, I like the idea, which is definitely better than my idea of new 
> option to recreate it manually when we know we have to.
>
> I can try to investigate a bit if you think that could lead to something 
> that makes sense.
> Two ideas I have in mind after a quick look at migrate command line code:
> 1/ Extract the code related to the "plan" (created by 
> executor.migration_plan(targets) function) to be also used somewhere else 
> (in clone_test_db for example...)
> 2/ Make "migrate" command return a sort of report (number of migrations 
> applied, ...) of what happened during a "migrate" call, that could then be 
> used in db.backends.base.creation.BaseDatabaseCreation.create_test_db and 
> passed to the connection.creation.clone_test_db loop block, moving from 
> "django.test.runner.setup_databases" to 
> "db.backends.base.creation.BaseDatabaseCreation.create_test_db" (as cloned 
> databases have a link with the state of the default one, it can justify 
> this move).
>
> This parallel option is really great and coming with some environment 
> constraints, as you said Aymeric, but for big projects, the gain is so 
> impressive that I will do all I can to help on that!
>
> I have the benchmark in my todo list, do you think it makes sense to 
> update the current PR with one of these 2 propositions explained above?
>
> Romain.
>
> Le mardi 5 juillet 2016 12:48:38 UTC+1, Markus Holtermann a écrit :
>>
>> Hi, 
>>
>> it might be a shot in the dark, but can't we check if Django's 
>> testrunner applied new migrations in which case we drop the cloned 
>> databases and recreate them. If all migrations already existed we keep 
>> the clones the way they are? 
>>
>> /Markus 
>>
>> On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote: 
>> >Hello, 
>> > 
>> >I’ll try to clarify what I said in the PR below.. 
>> > 
>> >The main reason for the `--parallel` option was to make Django’s own 
>> test suite faster. A full run went down from ~8m to ~1m30 when I committed 
>> that patch, which really helps the development cycle on invasive patches. 
>> > 
>> >Since Django’s own test runner bypasses migrations, whenever you make 
>> changes to a model in Django’s test suite, you need a run without 
>> `--keepdb`. So the problem you’re describing doesn’t exist for the primary 
>> use case. 
>> > 
>> >Now let’s talk about models and migrations in users’ projects, which is 
>> the logical next step and the use case you’re trying to improve. 
>> > 
>> >Note that the `--parallel` option is experimental and often 
>> non-functional in this context: as soon as two tests hit a resource other 
>> than the database — say, the cache — they can stomp upon one another. 
>> > 
>> >> On 05 Jul 2016, at 00:22, Romain Garrigues  
>> wrote: 
>> >> 
>> >> We could have just documented this limitation, but I don't think that 
>> my situation is a really rare edge case in terms of process, so I was 
>> suggesting to add a new option to be able to reset the cloned databases if 
>> needed (let's name it --parallel-clone-reset). 
>> > 
>> >When I make changes to models, usually I keep removing and recreating a 
>> single migration, which is incompatible with using the `--keepdb` option. 
>> Whenever I make changes, I run without `--keepdb` once. 
>> > 
>> >> I don't really like the idea of adding a new option, as it impacts the 
>> test runner, the clone_test_db function signature, ... but I have not found 
>> a better idea to at the same time keep the performances with --keepdb and 
>> --parallel, and handle these newly added migrations to a project. 
>> > 
>> >I’m not a fan of a new option either… 
>> > 
>> >> To summarize my proposal, this option (--parallel-clone-reset, or any 
>> other name) should be used only if you are using --keepdb and --parallel 
>> options at the same time, and when you have added a new migration between 2 
>> test run. 
>> > 
>> >IIRC this will more than double the run time of Django’s own test suite 
>> on MySQL: it will increase from ~2m to ~4m (give or take 30s) because 
>> cloning databases is slow on MySQL. 
>> > 
>> >I’m quoting all these figures from memory and I may mix them up. As I 
>> said on the ticket it would be useful to redo the benchmark on a first run 
>> and subsequent run of `./runtests.py` on PostgreSQL and MySQL. 
>> > 
>> >You could argue that it’s best to degrade the experience of a few Django 
>> contributors (original use case, Django’s test suite) for the benefits of 
>> the wider community (new use case, projects’ test suites). However the 
>> original use case is known to work and I don’t believe that the new use 
>> case works well enough in general, at least not without some engineering to 
>> isolate tests from one another. For this reason I’m not convinced by this 
>> argument. 
>> > 
>> >I

Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)

2016-07-05 Thread Romain Garrigues

Markus, I like the idea, which is definitely better than my idea of new 
option to recreate it manually when we know we have to.

I can try to investigate a bit if you think that could lead to something 
that makes sense.
Two ideas I have in mind after a quick look at migrate command line code:
1/ Extract the code related to the "plan" (created by 
executor.migration_plan(targets) function) to be also used somewhere else 
(in clone_test_db for example...)
2/ Make "migrate" command return a sort of report (number of migrations 
applied, ...) of what happened during a "migrate" call, that could then be 
used in db.backends.base.creation.BaseDatabaseCreation.create_test_db and 
passed to the connection.creation.clone_test_db loop block, moving from 
"django.test.runner.setup_databases" to 
"db.backends.base.creation.BaseDatabaseCreation.create_test_db" (as cloned 
databases have a link with the state of the default one, it can justify 
this move).

This parallel option is really great and coming with some environment 
constraints, as you said Aymeric, but for big projects, the gain is so 
impressive that I will do all I can to help on that!

I have the benchmark in my todo list, do you think it makes sense to update 
the current PR with one of these 2 propositions explained above?

Romain.

Le mardi 5 juillet 2016 12:48:38 UTC+1, Markus Holtermann a écrit :
>
> Hi, 
>
> it might be a shot in the dark, but can't we check if Django's 
> testrunner applied new migrations in which case we drop the cloned 
> databases and recreate them. If all migrations already existed we keep 
> the clones the way they are? 
>
> /Markus 
>
> On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote: 
> >Hello, 
> > 
> >I’ll try to clarify what I said in the PR below.. 
> > 
> >The main reason for the `--parallel` option was to make Django’s own test 
> suite faster. A full run went down from ~8m to ~1m30 when I committed that 
> patch, which really helps the development cycle on invasive patches. 
> > 
> >Since Django’s own test runner bypasses migrations, whenever you make 
> changes to a model in Django’s test suite, you need a run without 
> `--keepdb`. So the problem you’re describing doesn’t exist for the primary 
> use case. 
> > 
> >Now let’s talk about models and migrations in users’ projects, which is 
> the logical next step and the use case you’re trying to improve. 
> > 
> >Note that the `--parallel` option is experimental and often 
> non-functional in this context: as soon as two tests hit a resource other 
> than the database — say, the cache — they can stomp upon one another. 
> > 
> >> On 05 Jul 2016, at 00:22, Romain Garrigues  > wrote: 
> >> 
> >> We could have just documented this limitation, but I don't think that 
> my situation is a really rare edge case in terms of process, so I was 
> suggesting to add a new option to be able to reset the cloned databases if 
> needed (let's name it --parallel-clone-reset). 
> > 
> >When I make changes to models, usually I keep removing and recreating a 
> single migration, which is incompatible with using the `--keepdb` option. 
> Whenever I make changes, I run without `--keepdb` once. 
> > 
> >> I don't really like the idea of adding a new option, as it impacts the 
> test runner, the clone_test_db function signature, ... but I have not found 
> a better idea to at the same time keep the performances with --keepdb and 
> --parallel, and handle these newly added migrations to a project. 
> > 
> >I’m not a fan of a new option either… 
> > 
> >> To summarize my proposal, this option (--parallel-clone-reset, or any 
> other name) should be used only if you are using --keepdb and --parallel 
> options at the same time, and when you have added a new migration between 2 
> test run. 
> > 
> >IIRC this will more than double the run time of Django’s own test suite 
> on MySQL: it will increase from ~2m to ~4m (give or take 30s) because 
> cloning databases is slow on MySQL. 
> > 
> >I’m quoting all these figures from memory and I may mix them up. As I 
> said on the ticket it would be useful to redo the benchmark on a first run 
> and subsequent run of `./runtests.py` on PostgreSQL and MySQL. 
> > 
> >You could argue that it’s best to degrade the experience of a few Django 
> contributors (original use case, Django’s test suite) for the benefits of 
> the wider community (new use case, projects’ test suites). However the 
> original use case is known to work and I don’t believe that the new use 
> case works well enough in general, at least not without some engineering to 
> isolate tests from one another. For this reason I’m not convinced by this 
> argument. 
> > 
> >I hope this clarifies the context of the trade-off we’re discussing. 
> > 
> >Best regards, 
> > 
> >-- 
> >Aymeric. 
> > 
> >-- 
> >You received this message because you are subscribed to the Google Groups 
> "Django developers  (Contributions to Django itself)" group. 
> >To unsubscribe from this group 

Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)

2016-07-05 Thread Markus Holtermann

Hi,

it might be a shot in the dark, but can't we check if Django's
testrunner applied new migrations in which case we drop the cloned
databases and recreate them. If all migrations already existed we keep
the clones the way they are?

/Markus

On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote:

Hello,

I’ll try to clarify what I said in the PR below..

The main reason for the `--parallel` option was to make Django’s own test suite 
faster. A full run went down from ~8m to ~1m30 when I committed that patch, 
which really helps the development cycle on invasive patches.

Since Django’s own test runner bypasses migrations, whenever you make changes 
to a model in Django’s test suite, you need a run without `--keepdb`. So the 
problem you’re describing doesn’t exist for the primary use case.

Now let’s talk about models and migrations in users’ projects, which is the 
logical next step and the use case you’re trying to improve.

Note that the `--parallel` option is experimental and often non-functional in 
this context: as soon as two tests hit a resource other than the database — 
say, the cache — they can stomp upon one another.


On 05 Jul 2016, at 00:22, Romain Garrigues  
wrote:

We could have just documented this limitation, but I don't think that my 
situation is a really rare edge case in terms of process, so I was suggesting 
to add a new option to be able to reset the cloned databases if needed (let's 
name it --parallel-clone-reset).


When I make changes to models, usually I keep removing and recreating a single 
migration, which is incompatible with using the `--keepdb` option. Whenever I 
make changes, I run without `--keepdb` once.


I don't really like the idea of adding a new option, as it impacts the test 
runner, the clone_test_db function signature, ... but I have not found a better 
idea to at the same time keep the performances with --keepdb and --parallel, 
and handle these newly added migrations to a project.


I’m not a fan of a new option either…


To summarize my proposal, this option (--parallel-clone-reset, or any other 
name) should be used only if you are using --keepdb and --parallel options at 
the same time, and when you have added a new migration between 2 test run.


IIRC this will more than double the run time of Django’s own test suite on 
MySQL: it will increase from ~2m to ~4m (give or take 30s) because cloning 
databases is slow on MySQL.

I’m quoting all these figures from memory and I may mix them up. As I said on 
the ticket it would be useful to redo the benchmark on a first run and 
subsequent run of `./runtests.py` on PostgreSQL and MySQL.

You could argue that it’s best to degrade the experience of a few Django 
contributors (original use case, Django’s test suite) for the benefits of the 
wider community (new use case, projects’ test suites). However the original use 
case is known to work and I don’t believe that the new use case works well 
enough in general, at least not without some engineering to isolate tests from 
one another. For this reason I’m not convinced by this argument.

I hope this clarifies the context of the trade-off we’re discussing.

Best regards,

--
Aymeric.

--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/BD5FB0D4-C9E1-4681-A5A1-CCD6D6BB84B7%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20160705114828.GA3506%40inel.local.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: PGP signature


Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)

2016-07-05 Thread Aymeric Augustin
Hello,

I’ll try to clarify what I said in the PR below..

The main reason for the `--parallel` option was to make Django’s own test suite 
faster. A full run went down from ~8m to ~1m30 when I committed that patch, 
which really helps the development cycle on invasive patches.

Since Django’s own test runner bypasses migrations, whenever you make changes 
to a model in Django’s test suite, you need a run without `--keepdb`. So the 
problem you’re describing doesn’t exist for the primary use case.

Now let’s talk about models and migrations in users’ projects, which is the 
logical next step and the use case you’re trying to improve.

Note that the `--parallel` option is experimental and often non-functional in 
this context: as soon as two tests hit a resource other than the database — 
say, the cache — they can stomp upon one another.

> On 05 Jul 2016, at 00:22, Romain Garrigues  
> wrote:
> 
> We could have just documented this limitation, but I don't think that my 
> situation is a really rare edge case in terms of process, so I was suggesting 
> to add a new option to be able to reset the cloned databases if needed (let's 
> name it --parallel-clone-reset).

When I make changes to models, usually I keep removing and recreating a single 
migration, which is incompatible with using the `--keepdb` option. Whenever I 
make changes, I run without `--keepdb` once.

> I don't really like the idea of adding a new option, as it impacts the test 
> runner, the clone_test_db function signature, ... but I have not found a 
> better idea to at the same time keep the performances with --keepdb and 
> --parallel, and handle these newly added migrations to a project.

I’m not a fan of a new option either…

> To summarize my proposal, this option (--parallel-clone-reset, or any other 
> name) should be used only if you are using --keepdb and --parallel options at 
> the same time, and when you have added a new migration between 2 test run.

IIRC this will more than double the run time of Django’s own test suite on 
MySQL: it will increase from ~2m to ~4m (give or take 30s) because cloning 
databases is slow on MySQL.

I’m quoting all these figures from memory and I may mix them up. As I said on 
the ticket it would be useful to redo the benchmark on a first run and 
subsequent run of `./runtests.py` on PostgreSQL and MySQL.

You could argue that it’s best to degrade the experience of a few Django 
contributors (original use case, Django’s test suite) for the benefits of the 
wider community (new use case, projects’ test suites). However the original use 
case is known to work and I don’t believe that the new use case works well 
enough in general, at least not without some engineering to isolate tests from 
one another. For this reason I’m not convinced by this argument.

I hope this clarifies the context of the trade-off we’re discussing.

Best regards,

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/BD5FB0D4-C9E1-4681-A5A1-CCD6D6BB84B7%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)

2016-07-04 Thread Andreas Pelme
Hello,

> On 5 juli 2016, at 00:22, Romain Garrigues  
> wrote:
> After some investigation, I have seen that, in case of keepdb context, in 
> django/db/backends/*/creation.py, if the cloned databases already exist, we 
> don't touch them, which leads to this new field not created in cloned ones.
> 
> I have proposed in the PR to rebuild the cloned databases, even with keepdb 
> option, to be sure that we always have the cloned databases with the latest 
> migration state.
> 
> The problem with this method is that it will increase test database 
> initialization time, as we will now systematically copy all cloned databases, 
> even with --keepdb option (except the default one).

We’ve been doing similar things in pytest-django (with the --reuse-db option 
and pytest-xdist) and faces similar problems. Currently you have to 
force-recreate the databases and then all processes will run migrations and it 
is very slow.

I’ve been playing around with a solution to this: In my own project I create a 
template database and call it `test_myproject_`. 
Whenever a migration file changes (an existing file or a new migration file) - 
a new database is created and all clones are recreated.

Currently this lives as a hack in my own code base, but I would like to explore 
this further and it could be a way forward. Here is my scripts that calculates 
the hash and creates the databases:
https://gist.github.com/pelme/4b3dac475cd6b1dec4fd67d25d2e7cdc
https://gist.github.com/pelme/4a3ad3a62b6244068ff63736342f9509

This method could be refined: It is not necessary to create a database with a 
new name every time migrations change. I.e. we could create a private table 
with a single row that contains the hash.

This approach hashes only the migration files directly involved in migrations, 
if you are using a 3rd-party library that’s imported, that will not trigger a 
new migration run.

As an end user of this the experience is quite nice: You only experience the 
migration/cloning slowness whenever migrations actually changed, otherwise 
everything is fast. You don’t have to remember any special command line options.

Cheers,
Andreas

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5A553E81-F6FF-4A85-ADB3-01D66AE072B9%40pelme.se.
For more options, visit https://groups.google.com/d/optout.