Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)
Any feedback about my 2 proposals, to know if it is worth to spend time to propose a patch for one of those? Le mardi 5 juillet 2016 18:54:27 UTC+1, Romain Garrigues a écrit : > > > Markus, I like the idea, which is definitely better than my idea of new > option to recreate it manually when we know we have to. > > I can try to investigate a bit if you think that could lead to something > that makes sense. > Two ideas I have in mind after a quick look at migrate command line code: > 1/ Extract the code related to the "plan" (created by > executor.migration_plan(targets) function) to be also used somewhere else > (in clone_test_db for example...) > 2/ Make "migrate" command return a sort of report (number of migrations > applied, ...) of what happened during a "migrate" call, that could then be > used in db.backends.base.creation.BaseDatabaseCreation.create_test_db and > passed to the connection.creation.clone_test_db loop block, moving from > "django.test.runner.setup_databases" to > "db.backends.base.creation.BaseDatabaseCreation.create_test_db" (as cloned > databases have a link with the state of the default one, it can justify > this move). > > This parallel option is really great and coming with some environment > constraints, as you said Aymeric, but for big projects, the gain is so > impressive that I will do all I can to help on that! > > I have the benchmark in my todo list, do you think it makes sense to > update the current PR with one of these 2 propositions explained above? > > Romain. > > Le mardi 5 juillet 2016 12:48:38 UTC+1, Markus Holtermann a écrit : >> >> Hi, >> >> it might be a shot in the dark, but can't we check if Django's >> testrunner applied new migrations in which case we drop the cloned >> databases and recreate them. If all migrations already existed we keep >> the clones the way they are? >> >> /Markus >> >> On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote: >> >Hello, >> > >> >I’ll try to clarify what I said in the PR below.. >> > >> >The main reason for the `--parallel` option was to make Django’s own >> test suite faster. A full run went down from ~8m to ~1m30 when I committed >> that patch, which really helps the development cycle on invasive patches. >> > >> >Since Django’s own test runner bypasses migrations, whenever you make >> changes to a model in Django’s test suite, you need a run without >> `--keepdb`. So the problem you’re describing doesn’t exist for the primary >> use case. >> > >> >Now let’s talk about models and migrations in users’ projects, which is >> the logical next step and the use case you’re trying to improve. >> > >> >Note that the `--parallel` option is experimental and often >> non-functional in this context: as soon as two tests hit a resource other >> than the database — say, the cache — they can stomp upon one another. >> > >> >> On 05 Jul 2016, at 00:22, Romain Garrigues >> wrote: >> >> >> >> We could have just documented this limitation, but I don't think that >> my situation is a really rare edge case in terms of process, so I was >> suggesting to add a new option to be able to reset the cloned databases if >> needed (let's name it --parallel-clone-reset). >> > >> >When I make changes to models, usually I keep removing and recreating a >> single migration, which is incompatible with using the `--keepdb` option. >> Whenever I make changes, I run without `--keepdb` once. >> > >> >> I don't really like the idea of adding a new option, as it impacts the >> test runner, the clone_test_db function signature, ... but I have not found >> a better idea to at the same time keep the performances with --keepdb and >> --parallel, and handle these newly added migrations to a project. >> > >> >I’m not a fan of a new option either… >> > >> >> To summarize my proposal, this option (--parallel-clone-reset, or any >> other name) should be used only if you are using --keepdb and --parallel >> options at the same time, and when you have added a new migration between 2 >> test run. >> > >> >IIRC this will more than double the run time of Django’s own test suite >> on MySQL: it will increase from ~2m to ~4m (give or take 30s) because >> cloning databases is slow on MySQL. >> > >> >I’m quoting all these figures from memory and I may mix them up. As I >> said on the ticket it would be useful to redo the benchmark on a first run >> and subsequent run of `./runtests.py` on PostgreSQL and MySQL. >> > >> >You could argue that it’s best to degrade the experience of a few Django >> contributors (original use case, Django’s test suite) for the benefits of >> the wider community (new use case, projects’ test suites). However the >> original use case is known to work and I don’t believe that the new use >> case works well enough in general, at least not without some engineering to >> isolate tests from one another. For this reason I’m not convinced by this >> argument. >> > >> >I
Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)
Markus, I like the idea, which is definitely better than my idea of new option to recreate it manually when we know we have to. I can try to investigate a bit if you think that could lead to something that makes sense. Two ideas I have in mind after a quick look at migrate command line code: 1/ Extract the code related to the "plan" (created by executor.migration_plan(targets) function) to be also used somewhere else (in clone_test_db for example...) 2/ Make "migrate" command return a sort of report (number of migrations applied, ...) of what happened during a "migrate" call, that could then be used in db.backends.base.creation.BaseDatabaseCreation.create_test_db and passed to the connection.creation.clone_test_db loop block, moving from "django.test.runner.setup_databases" to "db.backends.base.creation.BaseDatabaseCreation.create_test_db" (as cloned databases have a link with the state of the default one, it can justify this move). This parallel option is really great and coming with some environment constraints, as you said Aymeric, but for big projects, the gain is so impressive that I will do all I can to help on that! I have the benchmark in my todo list, do you think it makes sense to update the current PR with one of these 2 propositions explained above? Romain. Le mardi 5 juillet 2016 12:48:38 UTC+1, Markus Holtermann a écrit : > > Hi, > > it might be a shot in the dark, but can't we check if Django's > testrunner applied new migrations in which case we drop the cloned > databases and recreate them. If all migrations already existed we keep > the clones the way they are? > > /Markus > > On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote: > >Hello, > > > >I’ll try to clarify what I said in the PR below.. > > > >The main reason for the `--parallel` option was to make Django’s own test > suite faster. A full run went down from ~8m to ~1m30 when I committed that > patch, which really helps the development cycle on invasive patches. > > > >Since Django’s own test runner bypasses migrations, whenever you make > changes to a model in Django’s test suite, you need a run without > `--keepdb`. So the problem you’re describing doesn’t exist for the primary > use case. > > > >Now let’s talk about models and migrations in users’ projects, which is > the logical next step and the use case you’re trying to improve. > > > >Note that the `--parallel` option is experimental and often > non-functional in this context: as soon as two tests hit a resource other > than the database — say, the cache — they can stomp upon one another. > > > >> On 05 Jul 2016, at 00:22, Romain Garrigues > wrote: > >> > >> We could have just documented this limitation, but I don't think that > my situation is a really rare edge case in terms of process, so I was > suggesting to add a new option to be able to reset the cloned databases if > needed (let's name it --parallel-clone-reset). > > > >When I make changes to models, usually I keep removing and recreating a > single migration, which is incompatible with using the `--keepdb` option. > Whenever I make changes, I run without `--keepdb` once. > > > >> I don't really like the idea of adding a new option, as it impacts the > test runner, the clone_test_db function signature, ... but I have not found > a better idea to at the same time keep the performances with --keepdb and > --parallel, and handle these newly added migrations to a project. > > > >I’m not a fan of a new option either… > > > >> To summarize my proposal, this option (--parallel-clone-reset, or any > other name) should be used only if you are using --keepdb and --parallel > options at the same time, and when you have added a new migration between 2 > test run. > > > >IIRC this will more than double the run time of Django’s own test suite > on MySQL: it will increase from ~2m to ~4m (give or take 30s) because > cloning databases is slow on MySQL. > > > >I’m quoting all these figures from memory and I may mix them up. As I > said on the ticket it would be useful to redo the benchmark on a first run > and subsequent run of `./runtests.py` on PostgreSQL and MySQL. > > > >You could argue that it’s best to degrade the experience of a few Django > contributors (original use case, Django’s test suite) for the benefits of > the wider community (new use case, projects’ test suites). However the > original use case is known to work and I don’t believe that the new use > case works well enough in general, at least not without some engineering to > isolate tests from one another. For this reason I’m not convinced by this > argument. > > > >I hope this clarifies the context of the trade-off we’re discussing. > > > >Best regards, > > > >-- > >Aymeric. > > > >-- > >You received this message because you are subscribed to the Google Groups > "Django developers (Contributions to Django itself)" group. > >To unsubscribe from this group
Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)
Hi, it might be a shot in the dark, but can't we check if Django's testrunner applied new migrations in which case we drop the cloned databases and recreate them. If all migrations already existed we keep the clones the way they are? /Markus On Tue, Jul 05, 2016 at 09:00:25AM +0200, Aymeric Augustin wrote: Hello, I’ll try to clarify what I said in the PR below.. The main reason for the `--parallel` option was to make Django’s own test suite faster. A full run went down from ~8m to ~1m30 when I committed that patch, which really helps the development cycle on invasive patches. Since Django’s own test runner bypasses migrations, whenever you make changes to a model in Django’s test suite, you need a run without `--keepdb`. So the problem you’re describing doesn’t exist for the primary use case. Now let’s talk about models and migrations in users’ projects, which is the logical next step and the use case you’re trying to improve. Note that the `--parallel` option is experimental and often non-functional in this context: as soon as two tests hit a resource other than the database — say, the cache — they can stomp upon one another. On 05 Jul 2016, at 00:22, Romain Garrigues wrote: We could have just documented this limitation, but I don't think that my situation is a really rare edge case in terms of process, so I was suggesting to add a new option to be able to reset the cloned databases if needed (let's name it --parallel-clone-reset). When I make changes to models, usually I keep removing and recreating a single migration, which is incompatible with using the `--keepdb` option. Whenever I make changes, I run without `--keepdb` once. I don't really like the idea of adding a new option, as it impacts the test runner, the clone_test_db function signature, ... but I have not found a better idea to at the same time keep the performances with --keepdb and --parallel, and handle these newly added migrations to a project. I’m not a fan of a new option either… To summarize my proposal, this option (--parallel-clone-reset, or any other name) should be used only if you are using --keepdb and --parallel options at the same time, and when you have added a new migration between 2 test run. IIRC this will more than double the run time of Django’s own test suite on MySQL: it will increase from ~2m to ~4m (give or take 30s) because cloning databases is slow on MySQL. I’m quoting all these figures from memory and I may mix them up. As I said on the ticket it would be useful to redo the benchmark on a first run and subsequent run of `./runtests.py` on PostgreSQL and MySQL. You could argue that it’s best to degrade the experience of a few Django contributors (original use case, Django’s test suite) for the benefits of the wider community (new use case, projects’ test suites). However the original use case is known to work and I don’t believe that the new use case works well enough in general, at least not without some engineering to isolate tests from one another. For this reason I’m not convinced by this argument. I hope this clarifies the context of the trade-off we’re discussing. Best regards, -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/BD5FB0D4-C9E1-4681-A5A1-CCD6D6BB84B7%40polytechnique.org. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/20160705114828.GA3506%40inel.local. For more options, visit https://groups.google.com/d/optout. signature.asc Description: PGP signature
Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)
Hello, I’ll try to clarify what I said in the PR below.. The main reason for the `--parallel` option was to make Django’s own test suite faster. A full run went down from ~8m to ~1m30 when I committed that patch, which really helps the development cycle on invasive patches. Since Django’s own test runner bypasses migrations, whenever you make changes to a model in Django’s test suite, you need a run without `--keepdb`. So the problem you’re describing doesn’t exist for the primary use case. Now let’s talk about models and migrations in users’ projects, which is the logical next step and the use case you’re trying to improve. Note that the `--parallel` option is experimental and often non-functional in this context: as soon as two tests hit a resource other than the database — say, the cache — they can stomp upon one another. > On 05 Jul 2016, at 00:22, Romain Garrigues > wrote: > > We could have just documented this limitation, but I don't think that my > situation is a really rare edge case in terms of process, so I was suggesting > to add a new option to be able to reset the cloned databases if needed (let's > name it --parallel-clone-reset). When I make changes to models, usually I keep removing and recreating a single migration, which is incompatible with using the `--keepdb` option. Whenever I make changes, I run without `--keepdb` once. > I don't really like the idea of adding a new option, as it impacts the test > runner, the clone_test_db function signature, ... but I have not found a > better idea to at the same time keep the performances with --keepdb and > --parallel, and handle these newly added migrations to a project. I’m not a fan of a new option either… > To summarize my proposal, this option (--parallel-clone-reset, or any other > name) should be used only if you are using --keepdb and --parallel options at > the same time, and when you have added a new migration between 2 test run. IIRC this will more than double the run time of Django’s own test suite on MySQL: it will increase from ~2m to ~4m (give or take 30s) because cloning databases is slow on MySQL. I’m quoting all these figures from memory and I may mix them up. As I said on the ticket it would be useful to redo the benchmark on a first run and subsequent run of `./runtests.py` on PostgreSQL and MySQL. You could argue that it’s best to degrade the experience of a few Django contributors (original use case, Django’s test suite) for the benefits of the wider community (new use case, projects’ test suites). However the original use case is known to work and I don’t believe that the new use case works well enough in general, at least not without some engineering to isolate tests from one another. For this reason I’m not convinced by this argument. I hope this clarifies the context of the trade-off we’re discussing. Best regards, -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/BD5FB0D4-C9E1-4681-A5A1-CCD6D6BB84B7%40polytechnique.org. For more options, visit https://groups.google.com/d/optout.
Re: Discussion related to ticket #26822 (new migrations, --keepdb and --parallel option)
Hello, > On 5 juli 2016, at 00:22, Romain Garrigues > wrote: > After some investigation, I have seen that, in case of keepdb context, in > django/db/backends/*/creation.py, if the cloned databases already exist, we > don't touch them, which leads to this new field not created in cloned ones. > > I have proposed in the PR to rebuild the cloned databases, even with keepdb > option, to be sure that we always have the cloned databases with the latest > migration state. > > The problem with this method is that it will increase test database > initialization time, as we will now systematically copy all cloned databases, > even with --keepdb option (except the default one). We’ve been doing similar things in pytest-django (with the --reuse-db option and pytest-xdist) and faces similar problems. Currently you have to force-recreate the databases and then all processes will run migrations and it is very slow. I’ve been playing around with a solution to this: In my own project I create a template database and call it `test_myproject_`. Whenever a migration file changes (an existing file or a new migration file) - a new database is created and all clones are recreated. Currently this lives as a hack in my own code base, but I would like to explore this further and it could be a way forward. Here is my scripts that calculates the hash and creates the databases: https://gist.github.com/pelme/4b3dac475cd6b1dec4fd67d25d2e7cdc https://gist.github.com/pelme/4a3ad3a62b6244068ff63736342f9509 This method could be refined: It is not necessary to create a database with a new name every time migrations change. I.e. we could create a private table with a single row that contains the hash. This approach hashes only the migration files directly involved in migrations, if you are using a 3rd-party library that’s imported, that will not trigger a new migration run. As an end user of this the experience is quite nice: You only experience the migration/cloning slowness whenever migrations actually changed, otherwise everything is fast. You don’t have to remember any special command line options. Cheers, Andreas -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5A553E81-F6FF-4A85-ADB3-01D66AE072B9%40pelme.se. For more options, visit https://groups.google.com/d/optout.