Re: Some thoughts about improving migration squashing

2017-02-20 Thread Patryk Zawadzki
W dniu poniedziałek, 20 lutego 2017 11:49:38 UTC+1 użytkownik Markus 
Holtermann napisał:
>
> On Thu, Feb 16, 2017 at 03:25:16PM -0800, rap...@makeleaps.com 
>  wrote: 
> >RemoveField('A', 'foo') also references A and foo, but does it reference 
> B? 
> >if it does, then it' s hard to have optimizations that pass through this, 
> >because this field could be referencing any model (theoretically). 
>
> No, that field can not reference any model. It reference exactly one 
> model (that even holds for FK to abstract models as fields from abstract 
> models are inlined in the concrete models in migrations). However, 
> RemoveField doesn't have the information to "just" figure out the 
> referenced model. RemoveField would need to look into the from_state's 
> apps and actually even look into the actual field that's referred. 
> ForeignKeys have an implicit to_field attribute, so, having 
>
>   AddField('A', 'foo', ForeignKey('B', to_field='bar')) 
>
> a 
>
>   RemoveField('A', 'foo') 
>
> references exactly one field on one particular model. Not more and not 
> less. The issue here is that RemoveField needs to take that information 
> from the state and not from one of its attributes. 
>

Technically it references _some_ model named "B" that was created not 
sooner than the current migration's explicit dependencies. It may be the 
model you saw when you created that migration or it may be some other 
model. You can tell your migration that it has to be executed no sooner 
than after another migration is complete but there is no way to say "but 
before model B is modified any further".

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b9964408-8dff-49ec-aa08-89826d35503c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-20 Thread Markus Holtermann

On Thu, Feb 16, 2017 at 03:25:16PM -0800, raph...@makeleaps.com wrote:

When you have AddField('A', 'foo', ForeignKey('B')), this operation
references A and foo, but also references B.


correct.


RemoveField('A', 'foo') also references A and foo, but does it reference B?
if it does, then it' s hard to have optimizations that pass through this,
because this field could be referencing any model (theoretically).


No, that field can not reference any model. It reference exactly one
model (that even holds for FK to abstract models as fields from abstract
models are inlined in the concrete models in migrations). However,
RemoveField doesn't have the information to "just" figure out the
referenced model. RemoveField would need to look into the from_state's
apps and actually even look into the actual field that's referred.
ForeignKeys have an implicit to_field attribute, so, having

 AddField('A', 'foo', ForeignKey('B', to_field='bar'))

a

 RemoveField('A', 'foo')

references exactly one field on one particular model. Not more and not
less. The issue here is that RemoveField needs to take that information
from the state and not from one of its attributes.

/Markus



But if we assert that RemoveField doesn't refer to any models referenced to
by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:


I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, rap...@makeleaps.com
 wrote:
>Hey Simon,
>
> I looked through your PR and added a couple comments. The main thing is
I
>think we can actually ignore the field context on "RemoveField", if only
>because the executor doesn't need it. Even though the field might be
>pointing to a related model, that doesn't prevent being optimized
through.
>
> This is hard to explain, but intuitively, each "RemoveField" is paired
>with an "AddField" or "CreateModel" that *does *depend on the related
>model. So if we have a potentially dangerous optimization, those initial
>operations will "protect" the causal order, not "RemoveField".
>
> Raphael
>
>--
>You received this message because you are subscribed to the Google Groups
"Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an
email to django-develop...@googlegroups.com .
>To post to this group, send email to django-d...@googlegroups.com
.
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.

>For more options, visit https://groups.google.com/d/optout.




--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7515c909-a015-451d-bdaa-f040e6322166%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20170220104923.GA17288%40inel.local.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: PGP signature


Re: Some thoughts about improving migration squashing

2017-02-17 Thread charettes
Hello Raphael,

Thanks for your detailed explanation! You clearly expressed why it's safe to
optimize through RemoveField operations and helped me lift any doubt about
what was a wrong assumption[1].

I gave your two passes optimization strategy a try and I believe I managed 
to
implement it correctly[2].

The plan is to modify the optimization algorithm to start by trying to 
perform
a "right" (as right of the migration in between) reduction and attempt a 
"left"
one if a subsequent operation can be reduced with. The "left" reduction will
only be performed if all the operations in between can optimize through 
(their
reduce() method returns True) the operation we're trying to perform a 
reduction
with.

Cheers,
Simon

[1] 
https://github.com/django/django/pull/7999/commits/6d0b740cbc9ae038f9eef95dac3057dccb283e6d
[2] https://github.com/django/django/pull/7999

Le vendredi 17 février 2017 01:14:58 UTC-5, Gaschignard, Raphael a écrit :
>
> To clarify on my previous post, if we're in the first case, then the +M 
> and -F operations can be optimized in one path to remove the dependency, 
> and then the +M' and -M' operation can be optimized through.
>
> In the second case (with an AddField operation), the +F and -F operations 
> will cancel each other out (since they will be "in between" +M' and -M') 
> and then the optimization can happen.
>
> A bit more holistically, since M's field depends on M' in the code (which 
> is where these operations come from, after all), in order to remove M', you 
> would first need to remove M's field, so you're almost guaranteed that a 
> RemoveField will be "in between" any of its dependency's creations/removals.
>
> If we have the field information in the RemoveField, we could check it. 
> But if we do not make an assumption on (no field information) RemoveFields, 
> it blocks a lot of possible optimizations. We could just do that (start 
> generating RemoveField with field info), though it would not allow for 
> older migrations to get optimized. I'd be good with doing either/both. 
>
> Raphael
>
> On Fri, Feb 17, 2017 at 3:09 PM, Gaschignard, Raphael <
> rap...@makeleaps.com > wrote:
>
>> Hi Simon,
>>
>>I think it's a bit more general than that. Why does the `RemoveField` 
>> exist? Because somewhere, an `AddField`-esque operation exists before it, 
>> right?
>>
>>   Let's say we have m, m' as models.
>>
>>   Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also 
>> have two operations +M', a CreateModel(m'), and -M', a RemoveModel(m') 
>> operation. Because we have a removeField operation, we also have +M, a 
>> "CreateModel(m), somewhere
>>
>>  Let's assume that we have an operations list like:   [ ...(1)..., +M', 
>> ...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and 
>> -M' together "through" -F?
>>
>> Because we have -F, we have +M somewhere. because m has a field for m', 
>> either: 
>>1 - The field is within the initial +M operation. Because the 
>> definition depends on M', it must be in group (2).
>>2 - The field is not within the initial +M operation. Because the 
>> definition of the field depends on M', we need an AddField operation (or 
>> moral equivalent) +F in group (2)
>>
>> So, if we have -F in between +M' and -M', we will also have either a +M 
>> which depends on M' or a +F which depends on M' within (2).
>>
>> So if the -F is between the two, there will also be another operation 
>> that will be present between the two that expresses the same dependencies. 
>> So the -F operation itself can avoid expressing its "RemoveField" indirect 
>> dependency to m' because another operation will do it for them.
>>
>> 
>>
>> It's hard to generalize this across everything because there are, after 
>> all, arbitrary migrations. But considering that RemoveField will only be 
>> reduced with a "moral equivalent" to AddField, I think we can expand this 
>> reasoning across everything to say that RemoveField will be sufficiently 
>> protected by the AddField operation's location (which will have to be after 
>> +M').
>>
>>
>> Raphael
>>
>>
>> On Fri, Feb 17, 2017 at 12:28 PM, charettes > > wrote:
>>
>>> > RemoveField('A', 'foo') also references A and foo, but does it 
>>> reference B? if it does, then it' s hard to have optimizations that pass 
>>> through this, because this field could be referencing any model 
>>> (theoretically).
>>>
>>> I think we all agree on that.
>>>
>>> > But if we assert that RemoveField doesn't refer to any models 
>>> referenced to by its field, then our optimizer can take a couple more 
>>> liberties.
>>>
>>> Do you have suggestion on how we can assert that it's the case? The only 
>>> way I could come up with was to make sure RemoveField has a reference to 
>>> the field it's removing. e.g. It would be generated in the form 
>>> `RemoveField('A', 'foo', ForeignKey('B'))`.
>>>
>>> Simon
>>>
>>>
>>> Le jeudi 16 février 2017 18:25:16 UTC-5, rap...@makeleaps.com a écrit :

 When you 

Re: Some thoughts about improving migration squashing

2017-02-16 Thread Gaschignard, Raphael
To clarify on my previous post, if we're in the first case, then the +M and
-F operations can be optimized in one path to remove the dependency, and
then the +M' and -M' operation can be optimized through.

In the second case (with an AddField operation), the +F and -F operations
will cancel each other out (since they will be "in between" +M' and -M')
and then the optimization can happen.

A bit more holistically, since M's field depends on M' in the code (which
is where these operations come from, after all), in order to remove M', you
would first need to remove M's field, so you're almost guaranteed that a
RemoveField will be "in between" any of its dependency's creations/removals.

If we have the field information in the RemoveField, we could check it. But
if we do not make an assumption on (no field information) RemoveFields, it
blocks a lot of possible optimizations. We could just do that (start
generating RemoveField with field info), though it would not allow for
older migrations to get optimized. I'd be good with doing either/both.

Raphael

On Fri, Feb 17, 2017 at 3:09 PM, Gaschignard, Raphael  wrote:

> Hi Simon,
>
>I think it's a bit more general than that. Why does the `RemoveField`
> exist? Because somewhere, an `AddField`-esque operation exists before it,
> right?
>
>   Let's say we have m, m' as models.
>
>   Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also have
> two operations +M', a CreateModel(m'), and -M', a RemoveModel(m')
> operation. Because we have a removeField operation, we also have +M, a
> "CreateModel(m), somewhere
>
>  Let's assume that we have an operations list like:   [ ...(1)..., +M',
> ...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and
> -M' together "through" -F?
>
> Because we have -F, we have +M somewhere. because m has a field for m',
> either:
>1 - The field is within the initial +M operation. Because the
> definition depends on M', it must be in group (2).
>2 - The field is not within the initial +M operation. Because the
> definition of the field depends on M', we need an AddField operation (or
> moral equivalent) +F in group (2)
>
> So, if we have -F in between +M' and -M', we will also have either a +M
> which depends on M' or a +F which depends on M' within (2).
>
> So if the -F is between the two, there will also be another operation that
> will be present between the two that expresses the same dependencies. So
> the -F operation itself can avoid expressing its "RemoveField" indirect
> dependency to m' because another operation will do it for them.
>
> 
>
> It's hard to generalize this across everything because there are, after
> all, arbitrary migrations. But considering that RemoveField will only be
> reduced with a "moral equivalent" to AddField, I think we can expand this
> reasoning across everything to say that RemoveField will be sufficiently
> protected by the AddField operation's location (which will have to be after
> +M').
>
>
> Raphael
>
>
> On Fri, Feb 17, 2017 at 12:28 PM, charettes  wrote:
>
>> > RemoveField('A', 'foo') also references A and foo, but does it
>> reference B? if it does, then it' s hard to have optimizations that pass
>> through this, because this field could be referencing any model
>> (theoretically).
>>
>> I think we all agree on that.
>>
>> > But if we assert that RemoveField doesn't refer to any models
>> referenced to by its field, then our optimizer can take a couple more
>> liberties.
>>
>> Do you have suggestion on how we can assert that it's the case? The only
>> way I could come up with was to make sure RemoveField has a reference to
>> the field it's removing. e.g. It would be generated in the form
>> `RemoveField('A', 'foo', ForeignKey('B'))`.
>>
>> Simon
>>
>>
>> Le jeudi 16 février 2017 18:25:16 UTC-5, rap...@makeleaps.com a écrit :
>>>
>>> When you have AddField('A', 'foo', ForeignKey('B')), this operation
>>> references A and foo, but also references B.
>>>
>>> RemoveField('A', 'foo') also references A and foo, but does it reference
>>> B? if it does, then it' s hard to have optimizations that pass through
>>> this, because this field could be referencing any model (theoretically).
>>>
>>> But if we assert that RemoveField doesn't refer to any models referenced
>>> to by its field, then our optimizer can take a couple more liberties.
>>>
>>> Raphael
>>>
>>> On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann
>>> wrote:

 I'm not sure if it's related or not wo what you're investigating,
 RemoveField cannot "just" optimized through, as you might have another
 AddField operation afterwards adding another field with the same name.

 /Markus

 On Thu, Feb 16, 2017 at 08:19:01AM -0800, rap...@makeleaps.com wrote:
 >Hey Simon,
 >
 > I looked through your PR and added a couple comments. The main thing
 is I
 >think we can actually ignore the field context on "RemoveField", if
 only
 >becaus

Re: Some thoughts about improving migration squashing

2017-02-16 Thread Gaschignard, Raphael
Hi Simon,

   I think it's a bit more general than that. Why does the `RemoveField`
exist? Because somewhere, an `AddField`-esque operation exists before it,
right?

  Let's say we have m, m' as models.

  Let -F be a RemoveField(m, 'foo', ForeignKey(m') operation. We also have
two operations +M', a CreateModel(m'), and -M', a RemoveModel(m')
operation. Because we have a removeField operation, we also have +M, a
"CreateModel(m), somewhere

 Let's assume that we have an operations list like:   [ ...(1)..., +M',
...(2)..., -F, ...(3)..., -M']. How do we know that we can reduce +M' and
-M' together "through" -F?

Because we have -F, we have +M somewhere. because m has a field for m',
either:
   1 - The field is within the initial +M operation. Because the definition
depends on M', it must be in group (2).
   2 - The field is not within the initial +M operation. Because the
definition of the field depends on M', we need an AddField operation (or
moral equivalent) +F in group (2)

So, if we have -F in between +M' and -M', we will also have either a +M
which depends on M' or a +F which depends on M' within (2).

So if the -F is between the two, there will also be another operation that
will be present between the two that expresses the same dependencies. So
the -F operation itself can avoid expressing its "RemoveField" indirect
dependency to m' because another operation will do it for them.



It's hard to generalize this across everything because there are, after
all, arbitrary migrations. But considering that RemoveField will only be
reduced with a "moral equivalent" to AddField, I think we can expand this
reasoning across everything to say that RemoveField will be sufficiently
protected by the AddField operation's location (which will have to be after
+M').


Raphael


On Fri, Feb 17, 2017 at 12:28 PM, charettes  wrote:

> > RemoveField('A', 'foo') also references A and foo, but does it reference
> B? if it does, then it' s hard to have optimizations that pass through
> this, because this field could be referencing any model (theoretically).
>
> I think we all agree on that.
>
> > But if we assert that RemoveField doesn't refer to any models referenced
> to by its field, then our optimizer can take a couple more liberties.
>
> Do you have suggestion on how we can assert that it's the case? The only
> way I could come up with was to make sure RemoveField has a reference to
> the field it's removing. e.g. It would be generated in the form
> `RemoveField('A', 'foo', ForeignKey('B'))`.
>
> Simon
>
>
> Le jeudi 16 février 2017 18:25:16 UTC-5, rap...@makeleaps.com a écrit :
>>
>> When you have AddField('A', 'foo', ForeignKey('B')), this operation
>> references A and foo, but also references B.
>>
>> RemoveField('A', 'foo') also references A and foo, but does it reference
>> B? if it does, then it' s hard to have optimizations that pass through
>> this, because this field could be referencing any model (theoretically).
>>
>> But if we assert that RemoveField doesn't refer to any models referenced
>> to by its field, then our optimizer can take a couple more liberties.
>>
>> Raphael
>>
>> On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
>>>
>>> I'm not sure if it's related or not wo what you're investigating,
>>> RemoveField cannot "just" optimized through, as you might have another
>>> AddField operation afterwards adding another field with the same name.
>>>
>>> /Markus
>>>
>>> On Thu, Feb 16, 2017 at 08:19:01AM -0800, rap...@makeleaps.com wrote:
>>> >Hey Simon,
>>> >
>>> > I looked through your PR and added a couple comments. The main thing
>>> is I
>>> >think we can actually ignore the field context on "RemoveField", if
>>> only
>>> >because the executor doesn't need it. Even though the field might be
>>> >pointing to a related model, that doesn't prevent being optimized
>>> through.
>>> >
>>> > This is hard to explain, but intuitively, each "RemoveField" is paired
>>> >with an "AddField" or "CreateModel" that *does *depend on the related
>>> >model. So if we have a potentially dangerous optimization, those
>>> initial
>>> >operations will "protect" the causal order, not "RemoveField".
>>> >
>>> > Raphael
>>> >
>>> >--
>>> >You received this message because you are subscribed to the Google
>>> Groups "Django developers  (Contributions to Django itself)" group.
>>> >To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-develop...@googlegroups.com.
>>> >To post to this group, send email to django-d...@googlegroups.com.
>>> >Visit this group at https://groups.google.com/group/django-developers.
>>> >To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%
>>> 40googlegroups.com.
>>> >For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contribut

Re: Some thoughts about improving migration squashing

2017-02-16 Thread charettes
> RemoveField('A', 'foo') also references A and foo, but does it reference 
B? if it does, then it' s hard to have optimizations that pass through 
this, because this field could be referencing any model (theoretically).

I think we all agree on that.

> But if we assert that RemoveField doesn't refer to any models referenced 
to by its field, then our optimizer can take a couple more liberties.

Do you have suggestion on how we can assert that it's the case? The only 
way I could come up with was to make sure RemoveField has a reference to 
the field it's removing. e.g. It would be generated in the form 
`RemoveField('A', 'foo', ForeignKey('B'))`.

Simon

Le jeudi 16 février 2017 18:25:16 UTC-5, rap...@makeleaps.com a écrit :
>
> When you have AddField('A', 'foo', ForeignKey('B')), this operation 
> references A and foo, but also references B. 
>
> RemoveField('A', 'foo') also references A and foo, but does it reference 
> B? if it does, then it' s hard to have optimizations that pass through 
> this, because this field could be referencing any model (theoretically).
>
> But if we assert that RemoveField doesn't refer to any models referenced 
> to by its field, then our optimizer can take a couple more liberties.
>
> Raphael
>
> On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
>>
>> I'm not sure if it's related or not wo what you're investigating, 
>> RemoveField cannot "just" optimized through, as you might have another 
>> AddField operation afterwards adding another field with the same name. 
>>
>> /Markus 
>>
>> On Thu, Feb 16, 2017 at 08:19:01AM -0800, rap...@makeleaps.com wrote: 
>> >Hey Simon, 
>> > 
>> > I looked through your PR and added a couple comments. The main thing is 
>> I 
>> >think we can actually ignore the field context on "RemoveField", if only 
>> >because the executor doesn't need it. Even though the field might be 
>> >pointing to a related model, that doesn't prevent being optimized 
>> through. 
>> > 
>> > This is hard to explain, but intuitively, each "RemoveField" is paired 
>> >with an "AddField" or "CreateModel" that *does *depend on the related 
>> >model. So if we have a potentially dangerous optimization, those initial 
>> >operations will "protect" the causal order, not "RemoveField". 
>> > 
>> > Raphael 
>> > 
>> >-- 
>> >You received this message because you are subscribed to the Google 
>> Groups "Django developers  (Contributions to Django itself)" group. 
>> >To unsubscribe from this group and stop receiving emails from it, send 
>> an email to django-develop...@googlegroups.com. 
>> >To post to this group, send email to django-d...@googlegroups.com. 
>> >Visit this group at https://groups.google.com/group/django-developers. 
>> >To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>>  
>>
>> >For more options, visit https://groups.google.com/d/optout. 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1805b163-4d47-4ace-b9d3-79b6346126cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-16 Thread raphael
When you have AddField('A', 'foo', ForeignKey('B')), this operation 
references A and foo, but also references B. 

RemoveField('A', 'foo') also references A and foo, but does it reference B? 
if it does, then it' s hard to have optimizations that pass through this, 
because this field could be referencing any model (theoretically).

But if we assert that RemoveField doesn't refer to any models referenced to 
by its field, then our optimizer can take a couple more liberties.

Raphael

On Friday, February 17, 2017 at 2:15:47 AM UTC+9, Markus Holtermann wrote:
>
> I'm not sure if it's related or not wo what you're investigating, 
> RemoveField cannot "just" optimized through, as you might have another 
> AddField operation afterwards adding another field with the same name. 
>
> /Markus 
>
> On Thu, Feb 16, 2017 at 08:19:01AM -0800, rap...@makeleaps.com 
>  wrote: 
> >Hey Simon, 
> > 
> > I looked through your PR and added a couple comments. The main thing is 
> I 
> >think we can actually ignore the field context on "RemoveField", if only 
> >because the executor doesn't need it. Even though the field might be 
> >pointing to a related model, that doesn't prevent being optimized 
> through. 
> > 
> > This is hard to explain, but intuitively, each "RemoveField" is paired 
> >with an "AddField" or "CreateModel" that *does *depend on the related 
> >model. So if we have a potentially dangerous optimization, those initial 
> >operations will "protect" the causal order, not "RemoveField". 
> > 
> > Raphael 
> > 
> >-- 
> >You received this message because you are subscribed to the Google Groups 
> "Django developers  (Contributions to Django itself)" group. 
> >To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-develop...@googlegroups.com . 
> >To post to this group, send email to django-d...@googlegroups.com 
> . 
> >Visit this group at https://groups.google.com/group/django-developers. 
> >To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
>  
>
> >For more options, visit https://groups.google.com/d/optout. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7515c909-a015-451d-bdaa-f040e6322166%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-16 Thread Markus Holtermann

I'm not sure if it's related or not wo what you're investigating,
RemoveField cannot "just" optimized through, as you might have another
AddField operation afterwards adding another field with the same name.

/Markus

On Thu, Feb 16, 2017 at 08:19:01AM -0800, raph...@makeleaps.com wrote:

Hey Simon,

I looked through your PR and added a couple comments. The main thing is I
think we can actually ignore the field context on "RemoveField", if only
because the executor doesn't need it. Even though the field might be
pointing to a related model, that doesn't prevent being optimized through.

This is hard to explain, but intuitively, each "RemoveField" is paired
with an "AddField" or "CreateModel" that *does *depend on the related
model. So if we have a potentially dangerous optimization, those initial
operations will "protect" the causal order, not "RemoveField".

Raphael

--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20170216171532.GE8346%40inel.local.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: PGP signature


Re: Some thoughts about improving migration squashing

2017-02-16 Thread raphael
Hey Simon, 

 I looked through your PR and added a couple comments. The main thing is I 
think we can actually ignore the field context on "RemoveField", if only 
because the executor doesn't need it. Even though the field might be 
pointing to a related model, that doesn't prevent being optimized through. 

 This is hard to explain, but intuitively, each "RemoveField" is paired 
with an "AddField" or "CreateModel" that *does *depend on the related 
model. So if we have a potentially dangerous optimization, those initial 
operations will "protect" the causal order, not "RemoveField".

 Raphael

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9dfdcec6-b98c-44f2-86af-99aaa8857cc9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-15 Thread charettes
Hi Raphael,

I've been working on making the optimizer a bit smarter recently and came 
to the
same conclusion as you concerning the "left" and "right" optimizations.

This should be possible to solve by allowing `Operation.reduce()` to return 
the
full list of operations it replaces by appending `in_betwen` before or 
after the
operations it combines to perform "left" or "right" optimizations.

In the mean time I have a solution that makes the optimizer perform a bit 
better
that I'd be glad to get your feedback on[1]. One thing bugging me was that
`RemoveField` operations didn't have any references to definition of the 
field
they were removing, making optimization "through" them unsafe as they could 
be
operating on a related field.

Cheers,
Simon

[1] https://github.com/django/django/pull/7999/files

Le mercredi 15 février 2017 07:22:11 UTC-5, rap...@makeleaps.com a écrit :
>
> Hello everyone,
>
>I spent a couple hours last night trying to improve the migration 
> squasher optimizer (migrations were taking almost 15 minutes in CI). I came 
> up with a couple ideas for anyone interested in improvements:
>  
>  1- Having an interactive mode for squashing would be interested. 
> Currently, when squashing migrations, I do the following: 
>
>- Generate an initial squash
>- Edit it (namely, move around operations to get more optimizations to 
>work)
>- remove the "replaces" tag, then rerun migration squashing to 
>"re-optimize"
>- repeat until I get something I like, then add the original 
>"replaces" tag
>
>It would be cool if instead, the process were (with a flag):
>
>- Generate an initial squash, but have the process wait for 
>confirmation to "commit" this squash as final (though writing out the file)
>- Edit the file, and tell the process to try re-optimizing with the 
>same file (getting around the "no-squash of squashes" rule)
>- Potentially, allow us to also step back
>
>  For example, the "squashmigrations" command output could look like:
>
> generated 0001_squashed_mig.py
>> optmize migration[yN]? 
>> 
>> regenerated 0001_squashed_mig.py
>> ( 20 operations -> 10 operations)
>> optimize migration[Ynr]? 
>> 
>> regenerated 0001_squashed_mig.py
>> ( No change in operation count)
>> optimize migration[Ynr]? 
>> 
>> rolled back to previous version
>> optimize migration[Ynr]? 
>> 
>> Saved migration
>>
>
>
> A simpler version of this command would simply be to add an 
> "optimizemigration" command that just reads in a single migration and 
> optimizes the operations, without touching any of the squashiness. 
>
>
>  2- The reducer might be a bit too pessimistic
>
>  Currently, the optimizer lets "reduce" operations (that take 2 operations 
> and return 0,1, or 2 operations, or None if nothing can be change) do 
> whatever they want. Because of that, if you have [A, B, C ,D] and B depends 
> on A, you can't reduce A and C because the reduction might remove A.
>
>  In reality there are two kinds of reduction operations that we could be 
> taking into account:
>
>- reducing "left". if you have [A, B, C, D] (for example, A is a 
> CreateModel , C is an AddField for the same model),  and you can reduce A 
> and C into just A' (A with C), giving [A', B, D], then it doesn't matter 
> that B depends on A. 
>
> The thing that matters is if C depends on B (for example, C adds a foreign 
> key to a model created in B). This is actually already encoded in the 
> CreateMode + AddField reduction, but is perhaps a more general case.
>
> In a sense, reducing A and C "to the left" means that we're bringing A and 
> C closer together only by moving C. This is a major part of the potential 
> reductions that the current optimizer is missing.
>
>- reducing right. If you have [A, B, C, D] (for example, A is a 
> CreateModel, C is a RemoveModel for the same model), and you can reduce A 
> and C into just C' (C with A), giving [B, C', D], then it does matter that 
> B depends on A. C can't depend on B (assuming causality holds in our 
> universe)
>
> This is the current mechanism, essentially. If B depends on A, then you 
> can't move A past B. 
>
>  Removing both operations is a special case of reducing right (You can 
> make C' into a no-op).
>
> I had monkeypatched a special case of  reducing left (taking CreateModel, 
> AddField of different models and swapping them . For example 
> [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), 
> AddField(A.foo), CreateModel(B)]) and got decent results, but I think 
> making the optimization code express these two concepts separately would 
> catch even more of the optimizations I saw that the optimizer didn't.
>
>
> I hope some of this is useful 
>  
>   Raphael
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developer

Re: Some thoughts about improving migration squashing

2017-02-15 Thread Andrew Godwin
>
>
> You're essentially speaking about 2 things here, in my opinion:
>
> 1. Adding a new feature for interactive squash
> 2. Improving the MigrationOptimizer
>
> I certainly see a point for 2. Not sure how much for 1. Anyway, your
> reasoning for 2 sounds great! I'd be more than happy if you want to get
> this into Django! Can you create a respective ticket and drop your
> explanation in there, please :)
>
>
I am also definitely in support of 2), and I can see 1) being useful for
migration sets that have a lot of operations the optimiser won't touch by
default (e.g. SQL), but it's probably more work to be less generally useful.

Andrew


>
> On Wednesday, February 15, 2017 at 1:22:11 PM UTC+1, rap...@makeleaps.com
> wrote:
>>
>> Hello everyone,
>>
>>I spent a couple hours last night trying to improve the migration
>> squasher optimizer (migrations were taking almost 15 minutes in CI). I came
>> up with a couple ideas for anyone interested in improvements:
>>
>>  1- Having an interactive mode for squashing would be interested.
>> Currently, when squashing migrations, I do the following:
>>
>>- Generate an initial squash
>>- Edit it (namely, move around operations to get more optimizations
>>to work)
>>- remove the "replaces" tag, then rerun migration squashing to
>>"re-optimize"
>>- repeat until I get something I like, then add the original
>>"replaces" tag
>>
>>It would be cool if instead, the process were (with a flag):
>>
>>- Generate an initial squash, but have the process wait for
>>confirmation to "commit" this squash as final (though writing out the 
>> file)
>>- Edit the file, and tell the process to try re-optimizing with the
>>same file (getting around the "no-squash of squashes" rule)
>>- Potentially, allow us to also step back
>>
>>  For example, the "squashmigrations" command output could look like:
>>
>> generated 0001_squashed_mig.py
>>> optmize migration[yN]?
>>> 
>>> regenerated 0001_squashed_mig.py
>>> ( 20 operations -> 10 operations)
>>> optimize migration[Ynr]?
>>> 
>>> regenerated 0001_squashed_mig.py
>>> ( No change in operation count)
>>> optimize migration[Ynr]?
>>> 
>>> rolled back to previous version
>>> optimize migration[Ynr]?
>>> 
>>> Saved migration
>>>
>>
>>
>> A simpler version of this command would simply be to add an
>> "optimizemigration" command that just reads in a single migration and
>> optimizes the operations, without touching any of the squashiness.
>>
>>
>>  2- The reducer might be a bit too pessimistic
>>
>>  Currently, the optimizer lets "reduce" operations (that take 2
>> operations and return 0,1, or 2 operations, or None if nothing can be
>> change) do whatever they want. Because of that, if you have [A, B, C ,D]
>> and B depends on A, you can't reduce A and C because the reduction might
>> remove A.
>>
>>  In reality there are two kinds of reduction operations that we could be
>> taking into account:
>>
>>- reducing "left". if you have [A, B, C, D] (for example, A is a
>> CreateModel , C is an AddField for the same model),  and you can reduce A
>> and C into just A' (A with C), giving [A', B, D], then it doesn't matter
>> that B depends on A.
>>
>> The thing that matters is if C depends on B (for example, C adds a
>> foreign key to a model created in B). This is actually already encoded in
>> the CreateMode + AddField reduction, but is perhaps a more general case.
>>
>> In a sense, reducing A and C "to the left" means that we're bringing A
>> and C closer together only by moving C. This is a major part of the
>> potential reductions that the current optimizer is missing.
>>
>>- reducing right. If you have [A, B, C, D] (for example, A is a
>> CreateModel, C is a RemoveModel for the same model), and you can reduce A
>> and C into just C' (C with A), giving [B, C', D], then it does matter that
>> B depends on A. C can't depend on B (assuming causality holds in our
>> universe)
>>
>> This is the current mechanism, essentially. If B depends on A, then you
>> can't move A past B.
>>
>>  Removing both operations is a special case of reducing right (You can
>> make C' into a no-op).
>>
>> I had monkeypatched a special case of  reducing left (taking CreateModel,
>> AddField of different models and swapping them . For example
>> [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A),
>> AddField(A.foo), CreateModel(B)]) and got decent results, but I think
>> making the optimization code express these two concepts separately would
>> catch even more of the optimizations I saw that the optimizer didn't.
>>
>>
>> I hope some of this is useful
>>
>>   Raphael
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@goo

Re: Some thoughts about improving migration squashing

2017-02-15 Thread raphael
Markus: Now that I've written a command to optimize a single migration 
file, I think that it's sufficient for the "squash, edit, optimize" 
workflow that I was doing before. It's more about offering people to get 
their squashing done well until our optimizer becomes omniscient. 

Florian: Having the command run on all migrations should be 
straightforward, I'll look into that!

Tim: It's not untested! I tested it locally by running it ;) I'll write up 
a couple test cases and some documentation.


I've posted one ticket  for 
the management command to pass a selected migration through the optimizer, 
and another ticket  for 
improvements to the optimizer itself.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/686a3cd9-c15c-4619-a2f7-5aa9f9978910%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-15 Thread Markus Holtermann
Thanks Raphael, that's a pretty good write up!

You're essentially speaking about 2 things here, in my opinion:

1. Adding a new feature for interactive squash
2. Improving the MigrationOptimizer

I certainly see a point for 2. Not sure how much for 1. Anyway, your 
reasoning for 2 sounds great! I'd be more than happy if you want to get 
this into Django! Can you create a respective ticket and drop your 
explanation in there, please :)

Cheers,

/Markus

On Wednesday, February 15, 2017 at 1:22:11 PM UTC+1, rap...@makeleaps.com 
wrote:
>
> Hello everyone,
>
>I spent a couple hours last night trying to improve the migration 
> squasher optimizer (migrations were taking almost 15 minutes in CI). I came 
> up with a couple ideas for anyone interested in improvements:
>  
>  1- Having an interactive mode for squashing would be interested. 
> Currently, when squashing migrations, I do the following: 
>
>- Generate an initial squash
>- Edit it (namely, move around operations to get more optimizations to 
>work)
>- remove the "replaces" tag, then rerun migration squashing to 
>"re-optimize"
>- repeat until I get something I like, then add the original 
>"replaces" tag
>
>It would be cool if instead, the process were (with a flag):
>
>- Generate an initial squash, but have the process wait for 
>confirmation to "commit" this squash as final (though writing out the file)
>- Edit the file, and tell the process to try re-optimizing with the 
>same file (getting around the "no-squash of squashes" rule)
>- Potentially, allow us to also step back
>
>  For example, the "squashmigrations" command output could look like:
>
> generated 0001_squashed_mig.py
>> optmize migration[yN]? 
>> 
>> regenerated 0001_squashed_mig.py
>> ( 20 operations -> 10 operations)
>> optimize migration[Ynr]? 
>> 
>> regenerated 0001_squashed_mig.py
>> ( No change in operation count)
>> optimize migration[Ynr]? 
>> 
>> rolled back to previous version
>> optimize migration[Ynr]? 
>> 
>> Saved migration
>>
>
>
> A simpler version of this command would simply be to add an 
> "optimizemigration" command that just reads in a single migration and 
> optimizes the operations, without touching any of the squashiness. 
>
>
>  2- The reducer might be a bit too pessimistic
>
>  Currently, the optimizer lets "reduce" operations (that take 2 operations 
> and return 0,1, or 2 operations, or None if nothing can be change) do 
> whatever they want. Because of that, if you have [A, B, C ,D] and B depends 
> on A, you can't reduce A and C because the reduction might remove A.
>
>  In reality there are two kinds of reduction operations that we could be 
> taking into account:
>
>- reducing "left". if you have [A, B, C, D] (for example, A is a 
> CreateModel , C is an AddField for the same model),  and you can reduce A 
> and C into just A' (A with C), giving [A', B, D], then it doesn't matter 
> that B depends on A. 
>
> The thing that matters is if C depends on B (for example, C adds a foreign 
> key to a model created in B). This is actually already encoded in the 
> CreateMode + AddField reduction, but is perhaps a more general case.
>
> In a sense, reducing A and C "to the left" means that we're bringing A and 
> C closer together only by moving C. This is a major part of the potential 
> reductions that the current optimizer is missing.
>
>- reducing right. If you have [A, B, C, D] (for example, A is a 
> CreateModel, C is a RemoveModel for the same model), and you can reduce A 
> and C into just C' (C with A), giving [B, C', D], then it does matter that 
> B depends on A. C can't depend on B (assuming causality holds in our 
> universe)
>
> This is the current mechanism, essentially. If B depends on A, then you 
> can't move A past B. 
>
>  Removing both operations is a special case of reducing right (You can 
> make C' into a no-op).
>
> I had monkeypatched a special case of  reducing left (taking CreateModel, 
> AddField of different models and swapping them . For example 
> [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), 
> AddField(A.foo), CreateModel(B)]) and got decent results, but I think 
> making the optimization code express these two concepts separately would 
> catch even more of the optimizations I saw that the optimizer didn't.
>
>
> I hope some of this is useful 
>  
>   Raphael
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bcc73cef-e6f8-4aaf-b5a9-4592895b3b2e%40googlegroups.com.
For more options, visit

Re: Some thoughts about improving migration squashing

2017-02-15 Thread Markus Holtermann
What might be interesting to look into when squashing all migrations in one 
app would be to assume no migrations would exist. That could then result in 
only 2 migrations which could run through the optimizer (as opposed to 
let's say 20 migrations with many more operations).

/Markus

On Wednesday, February 15, 2017 at 3:46:01 PM UTC+1, Florian Apolloner 
wrote:
>
> Fwiw I think by default it could/should try to optimize all migrations of 
> an app, manually specifying the migration name should be optional.
>
> On Wednesday, February 15, 2017 at 2:00:54 PM UTC+1, rap...@makeleaps.com 
> wrote:
>>
>> I ended up having some time today, so wrote up a management command for 
>> the first suggestion!
>>
>> I called it "optizemigration"
>>
>>
>> >>> ./manage.py optimizemigration appname 0001_squashed
>>   # snipped django startup noise
>>   Optimized from 9 operations to 4 operations
>>
>> Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations
>> /0001_squashed_20170215.py
>>
>>
>> This reads in the migration file, runs the migration optimizer, and then 
>> outputs to the same file. Writing it has paid off almost immediately for me.
>>
>> Those who are interested can take a look here 
>> 
>> .
>>
>> How much testing/coverage requirements are there for management commands 
>> like these?
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/24896164-cf2f-4eb9-8d87-3932f2c30e78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-15 Thread Florian Apolloner
Fwiw I think by default it could/should try to optimize all migrations of 
an app, manually specifying the migration name should be optional.

On Wednesday, February 15, 2017 at 2:00:54 PM UTC+1, rap...@makeleaps.com 
wrote:
>
> I ended up having some time today, so wrote up a management command for 
> the first suggestion!
>
> I called it "optizemigration"
>
>
> >>> ./manage.py optimizemigration appname 0001_squashed
>   # snipped django startup noise
>   Optimized from 9 operations to 4 operations
>
> Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/
> 0001_squashed_20170215.py
>
>
> This reads in the migration file, runs the migration optimizer, and then 
> outputs to the same file. Writing it has paid off almost immediately for me.
>
> Those who are interested can take a look here 
> 
> .
>
> How much testing/coverage requirements are there for management commands 
> like these?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6055f360-b916-4bd3-8b9a-eea3768c1241%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-15 Thread Tim Graham
Hi Raphael,

It looks like a similar idea was proposed in 
https://groups.google.com/d/topic/django-developers/C1L-NhyQYG4/discussion. 
I don't think a ticket was ever created, so you can do that.

100% test coverage is required. Why would we accept untested code? ;-)

On Wednesday, February 15, 2017 at 8:00:54 AM UTC-5, rap...@makeleaps.com 
wrote:
>
> I ended up having some time today, so wrote up a management command for 
> the first suggestion!
>
> I called it "optizemigration"
>
>
> >>> ./manage.py optimizemigration appname 0001_squashed
>   # snipped django startup noise
>   Optimized from 9 operations to 4 operations
>
> Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/
> 0001_squashed_20170215.py
>
>
> This reads in the migration file, runs the migration optimizer, and then 
> outputs to the same file. Writing it has paid off almost immediately for me.
>
> Those who are interested can take a look here 
> 
> .
>
> How much testing/coverage requirements are there for management commands 
> like these?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c9d9a4d2-49ab-448b-8e5c-13ea3795c2dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some thoughts about improving migration squashing

2017-02-15 Thread raphael
I ended up having some time today, so wrote up a management command for the 
first suggestion!

I called it "optizemigration"


>>> ./manage.py optimizemigration appname 0001_squashed
  # snipped django startup noise
  Optimized from 9 operations to 4 operations

Optimized migration /Users/rtpg/proj/projname/projname/appname/migrations/
0001_squashed_20170215.py


This reads in the migration file, runs the migration optimizer, and then 
outputs to the same file. Writing it has paid off almost immediately for me.

Those who are interested can take a look here 

.

How much testing/coverage requirements are there for management commands 
like these?


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b4924688-1001-4a2b-a8de-20e6a10ff83f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Some thoughts about improving migration squashing

2017-02-15 Thread raphael
Hello everyone,

   I spent a couple hours last night trying to improve the migration 
squasher optimizer (migrations were taking almost 15 minutes in CI). I came 
up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. 
Currently, when squashing migrations, I do the following: 

   - Generate an initial squash
   - Edit it (namely, move around operations to get more optimizations to 
   work)
   - remove the "replaces" tag, then rerun migration squashing to 
   "re-optimize"
   - repeat until I get something I like, then add the original "replaces" 
   tag

   It would be cool if instead, the process were (with a flag):

   - Generate an initial squash, but have the process wait for confirmation 
   to "commit" this squash as final (though writing out the file)
   - Edit the file, and tell the process to try re-optimizing with the same 
   file (getting around the "no-squash of squashes" rule)
   - Potentially, allow us to also step back

 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
> optmize migration[yN]? 
> 
> regenerated 0001_squashed_mig.py
> ( 20 operations -> 10 operations)
> optimize migration[Ynr]? 
> 
> regenerated 0001_squashed_mig.py
> ( No change in operation count)
> optimize migration[Ynr]? 
> 
> rolled back to previous version
> optimize migration[Ynr]? 
> 
> Saved migration
>


A simpler version of this command would simply be to add an 
"optimizemigration" command that just reads in a single migration and 
optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations 
and return 0,1, or 2 operations, or None if nothing can be change) do 
whatever they want. Because of that, if you have [A, B, C ,D] and B depends 
on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be 
taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a 
CreateModel , C is an AddField for the same model),  and you can reduce A 
and C into just A' (A with C), giving [A', B, D], then it doesn't matter 
that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign 
key to a model created in B). This is actually already encoded in the 
CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and 
C closer together only by moving C. This is a major part of the potential 
reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a 
CreateModel, C is a RemoveModel for the same model), and you can reduce A 
and C into just C' (C with A), giving [B, C', D], then it does matter that 
B depends on A. C can't depend on B (assuming causality holds in our 
universe)

This is the current mechanism, essentially. If B depends on A, then you 
can't move A past B. 

 Removing both operations is a special case of reducing right (You can make 
C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, 
AddField of different models and swapping them . For example 
[CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), 
AddField(A.foo), CreateModel(B)]) and got decent results, but I think 
making the optimization code express these two concepts separately would 
catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6042857b-fd48-467a-badc-1cf5305ecac4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.