Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-05 Thread Carl Baldwin
On Tue, Aug 2, 2016 at 6:55 PM, Carl Baldwin  wrote:

> On Aug 2, 2016 6:52 PM, "Kevin Benton"  wrote:
> > If we decide to just fix the exception handler inside of ipam itself for
> rollbacks (which would be a quick fix), I would be okay with that but we
> need to be clear that any driver depending on that alone for state
> synchronization is in a very dangerous position of becoming inconsistent
> (i.e. I want something to point people to if we get bug reports saying that
> the delete call wasn't made when the port failed to create).
>
> I think we could fix it in steps. I do think that both issues are worth
> fixing and will pursue them both. I'll file a bugs.
>
After some discussion in IRC [2], I think I have a plan. The short term fix
is to stop calling rollback for the in-tree driver. Since it uses the same
DB session as Neutron, its changes will be rolled back regardless. I
implemented that in the context of the original patch that I linked early
in this thread [1]. I also cleaned up the unit test *a lot* now that I had
a little time to see what it really needed to do. I think that can merge
now to fix rollback for the in-tree driver (which would be the only one
broken in this way anyway).

This still leaves the fact that IPAM rollback is really pretty broken for
other drivers. This isn't new in Newton but now we understand better how
badly it is broken. I've filed a bug about that [3].

Carl

[1] https://review.openstack.org/#/c/348956
[2]
http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-08-03.log.html#t2016-08-03T18:08:58
[3] https://bugs.launchpad.net/neutron/+bug/1610483
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-02 Thread Carl Baldwin
On Aug 2, 2016 6:52 PM, "Kevin Benton"  wrote:
>
> >It might be the wrong impression, but it was already given and there are
drivers which have been written under it. That's why I tend toward fixing
rollback instead of eliminating it.
>
> The reason I thought it was relevant to bring up is because it's going to
be difficult to actually fix it. If any of the following lines fail, none
of the IPAM rollback code will be called:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L1195-L1215

Agreed. Just saying why i didn't bring it up on this context to start with.

> If we decide to just fix the exception handler inside of ipam itself for
rollbacks (which would be a quick fix), I would be okay with that but we
need to be clear that any driver depending on that alone for state
synchronization is in a very dangerous position of becoming inconsistent
(i.e. I want something to point people to if we get bug reports saying that
the delete call wasn't made when the port failed to create).

I think we could fix it in steps. I do think that both issues are worth
fixing and will pursue them both. I'll file a bugs.

Carl

> On Tue, Aug 2, 2016 at 3:27 PM, Carl Baldwin  wrote:
>>
>> On Tue, Aug 2, 2016 at 2:50 AM, Kevin Benton  wrote:
>>>
>>> >Given that it shares the session, it wouldn't have to do anything.
But, again, it wouldn't behave like an external driver.
>>>
>>> Why not? The only additional thing an external driver would be doing at
this step is calling an external system. Any local accounting in the DB
that driver would do would automatically be rolled back just like the
in-tree system.
>>
>> See below.
>>>
>>> Keep in mind that anything else can fail later in the transaction
outside of IPAM (e.g. ML2 driver precommit validation) and none of this
IPAM rollback logic will be called. Maybe the right answer is to get rid of
the IPAM rollback logic completely because it gives the wrong impression
that it is going to be called on all failures to commit when it's really
only called in failures inside of IPAM's module. Essentially every instance
of _safe_rollback in [1] is in an exception handler that isn't triggered if
there are exceptions anywhere in the core plugin after the initial base DB
calls.
>>
>> I noticed that there are failures which will not call rollback. I
started thinking about it when I wrote this note [1] (I did realize that
rollback might not even happen with the flush under the likes of galera and
that there are other failures that can happen outside of this and would
fail to rollback). I didn't bring it up here because I thought it was a
separate issue.
>>
>> It might be the wrong impression, but it was already given and there are
drivers which have been written under it. That's why I tend toward fixing
rollback instead of eliminating it. If we eliminate the idea, it isn't
clear to me yet how drivers will handle leaked allocations (or
deallocations). We could talk about some alternatives. I've got a few
knocking around the back of my head but nothing that seems like a complete
picture yet.
>>
>> If one only cares about the in-tree driver which doesn't need an
explicit rollback call then one probably wouldn't care about having one at
all. This is the kind of thing I'd like to avoid by having the in-tree
driver work more like any other external driver. When the in-tree driver
works differently than others because it has a closer relationship to the
rest of the system, we quickly forget the needs of other drivers.
>>
>> Carl
>>
>> [1]
https://review.openstack.org/#/c/348956/1/neutron/tests/unit/extensions/test_segment.py@793
>>
>>> 1.
https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py
>>>
>>>
>>> On Aug 1, 2016 18:11, "Carl Baldwin"  wrote:



 On Mon, Aug 1, 2016 at 2:29 PM, Kevin Benton  wrote:
>
> >We still want the exception to rollback the entire API operation and
stopping it with a nested operation I think would mess that up.
>
> Well I think you would want to start a nested transaction, capture
the duplicate, call the ipam delete methods, then throw a retryrequest. The
exception will still trigger a rollback of the entire operation.


 This is kind of what I was headed when I decided to solicit some
feedback. It is a possibility should still be considered.

>
> >Second, I've been throwing around the idea of not sharing the
session with the IPAM driver.
>
> If the IPAM driver does not have access to the session, it can't see
any of the uncommitted data. Would that be a problem? In particular,
doesn't the IPAM driver's DB table have foreign key constraints with the
data waiting to be committed in the other session? I'm hesitant to take
this approach because it means other (if the in-tree doesn't already) IPAM
drivers cannot have any relational integrity with the objects in 

Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-02 Thread Kevin Benton
>It might be the wrong impression, but it was already given and there are
drivers which have been written under it. That's why I tend toward fixing
rollback instead of eliminating it.

The reason I thought it was relevant to bring up is because it's going to
be difficult to actually fix it. If any of the following lines fail, none
of the IPAM rollback code will be called:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L1195-L1215

If we decide to just fix the exception handler inside of ipam itself for
rollbacks (which would be a quick fix), I would be okay with that but we
need to be clear that any driver depending on that alone for state
synchronization is in a very dangerous position of becoming inconsistent
(i.e. I want something to point people to if we get bug reports saying that
the delete call wasn't made when the port failed to create).

On Tue, Aug 2, 2016 at 3:27 PM, Carl Baldwin  wrote:

> On Tue, Aug 2, 2016 at 2:50 AM, Kevin Benton  wrote:
>
>> >Given that it shares the session, it wouldn't have to do anything. But,
>> again, it wouldn't behave like an external driver.
>>
> Why not? The only additional thing an external driver would be doing at
>> this step is calling an external system. Any local accounting in the DB
>> that driver would do would automatically be rolled back just like the
>> in-tree system.
>>
> See below.
>
>> Keep in mind that anything else can fail later in the transaction outside
>> of IPAM (e.g. ML2 driver precommit validation) and none of this IPAM
>> rollback logic will be called. Maybe the right answer is to get rid of the
>> IPAM rollback logic completely because it gives the wrong impression that
>> it is going to be called on all failures to commit when it's really only
>> called in failures inside of IPAM's module. Essentially every instance of
>> _safe_rollback in [1] is in an exception handler that isn't triggered if
>> there are exceptions anywhere in the core plugin after the initial base DB
>> calls.
>>
> I noticed that there are failures which will not call rollback. I started
> thinking about it when I wrote this note [1] (I did realize that rollback
> might not even happen with the flush under the likes of galera and that
> there are other failures that can happen outside of this and would fail to
> rollback). I didn't bring it up here because I thought it was a separate
> issue.
>
> It might be the wrong impression, but it was already given and there are
> drivers which have been written under it. That's why I tend toward fixing
> rollback instead of eliminating it. If we eliminate the idea, it isn't
> clear to me yet how drivers will handle leaked allocations (or
> deallocations). We could talk about some alternatives. I've got a few
> knocking around the back of my head but nothing that seems like a complete
> picture yet.
>
> If one only cares about the in-tree driver which doesn't need an explicit
> rollback call then one probably wouldn't care about having one at all. This
> is the kind of thing I'd like to avoid by having the in-tree driver work
> more like any other external driver. When the in-tree driver works
> differently than others because it has a closer relationship to the rest of
> the system, we quickly forget the needs of other drivers.
>
> Carl
>
> [1]
> https://review.openstack.org/#/c/348956/1/neutron/tests/unit/extensions/test_segment.py@793
>
> 1.
>> https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py
>>
>> On Aug 1, 2016 18:11, "Carl Baldwin"  wrote:
>>
>>>
>>>
>>> On Mon, Aug 1, 2016 at 2:29 PM, Kevin Benton  wrote:
>>>
 >We still want the exception to rollback the entire API operation and
 stopping it with a nested operation I think would mess that up.

 Well I think you would want to start a nested transaction, capture the
 duplicate, call the ipam delete methods, then throw a retryrequest. The
 exception will still trigger a rollback of the entire operation.

>>>
>>> This is kind of what I was headed when I decided to solicit some
>>> feedback. It is a possibility should still be considered.
>>>
>>>
 >Second, I've been throwing around the idea of not sharing the session
 with the IPAM driver.

 If the IPAM driver does not have access to the session, it can't see
 any of the uncommitted data. Would that be a problem? In particular,
 doesn't the IPAM driver's DB table have foreign key constraints with the
 data waiting to be committed in the other session? I'm hesitant to take
 this approach because it means other (if the in-tree doesn't already) IPAM
 drivers cannot have any relational integrity with the objects in question.

>>>
>>> The in-tree driver doesn't have any FK constraints back to the neutron
>>> db schema for IPAM [1]. I don't think that would make sense since it is
>>> supposed to work like an external 

Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-02 Thread Carl Baldwin
On Tue, Aug 2, 2016 at 2:50 AM, Kevin Benton  wrote:

> >Given that it shares the session, it wouldn't have to do anything. But,
> again, it wouldn't behave like an external driver.
>
Why not? The only additional thing an external driver would be doing at
> this step is calling an external system. Any local accounting in the DB
> that driver would do would automatically be rolled back just like the
> in-tree system.
>
See below.

> Keep in mind that anything else can fail later in the transaction outside
> of IPAM (e.g. ML2 driver precommit validation) and none of this IPAM
> rollback logic will be called. Maybe the right answer is to get rid of the
> IPAM rollback logic completely because it gives the wrong impression that
> it is going to be called on all failures to commit when it's really only
> called in failures inside of IPAM's module. Essentially every instance of
> _safe_rollback in [1] is in an exception handler that isn't triggered if
> there are exceptions anywhere in the core plugin after the initial base DB
> calls.
>
I noticed that there are failures which will not call rollback. I started
thinking about it when I wrote this note [1] (I did realize that rollback
might not even happen with the flush under the likes of galera and that
there are other failures that can happen outside of this and would fail to
rollback). I didn't bring it up here because I thought it was a separate
issue.

It might be the wrong impression, but it was already given and there are
drivers which have been written under it. That's why I tend toward fixing
rollback instead of eliminating it. If we eliminate the idea, it isn't
clear to me yet how drivers will handle leaked allocations (or
deallocations). We could talk about some alternatives. I've got a few
knocking around the back of my head but nothing that seems like a complete
picture yet.

If one only cares about the in-tree driver which doesn't need an explicit
rollback call then one probably wouldn't care about having one at all. This
is the kind of thing I'd like to avoid by having the in-tree driver work
more like any other external driver. When the in-tree driver works
differently than others because it has a closer relationship to the rest of
the system, we quickly forget the needs of other drivers.

Carl

[1]
https://review.openstack.org/#/c/348956/1/neutron/tests/unit/extensions/test_segment.py@793

1.
> https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py
>
> On Aug 1, 2016 18:11, "Carl Baldwin"  wrote:
>
>>
>>
>> On Mon, Aug 1, 2016 at 2:29 PM, Kevin Benton  wrote:
>>
>>> >We still want the exception to rollback the entire API operation and
>>> stopping it with a nested operation I think would mess that up.
>>>
>>> Well I think you would want to start a nested transaction, capture the
>>> duplicate, call the ipam delete methods, then throw a retryrequest. The
>>> exception will still trigger a rollback of the entire operation.
>>>
>>
>> This is kind of what I was headed when I decided to solicit some
>> feedback. It is a possibility should still be considered.
>>
>>
>>> >Second, I've been throwing around the idea of not sharing the session
>>> with the IPAM driver.
>>>
>>> If the IPAM driver does not have access to the session, it can't see any
>>> of the uncommitted data. Would that be a problem? In particular, doesn't
>>> the IPAM driver's DB table have foreign key constraints with the data
>>> waiting to be committed in the other session? I'm hesitant to take this
>>> approach because it means other (if the in-tree doesn't already) IPAM
>>> drivers cannot have any relational integrity with the objects in question.
>>>
>>
>> The in-tree driver doesn't have any FK constraints back to the neutron db
>> schema for IPAM [1]. I don't think that would make sense since it is
>> supposed to work like an external driver.
>>
>>
>>> A related question is, why does the in-tree IPAM driver have to do
>>> anything at all on a rollback? It currently does share a session which is
>>> automatically going to rollback all of it's DB operations for it. If it's
>>> because the driver cannot distinguish a delete call from a rollback and a
>>> normal delete, I suggest we change the delete call to pass a flag
>>> indicating that it's for a rollback. That would allow any DB-based drivers
>>> to just do nothing at this step.
>>>
>>
>> Given that it shares the session, it wouldn't have to do anything. But,
>> again, it wouldn't behave like an external driver. I'd like to not have
>> special drivers that behave differently than drivers that are really
>> external; we end up finding things that the in-tree driver does in our
>> testing that doesn't work right for other drivers.
>>
>> Drivers might need to access uncommitted data from the neutron DB. I
>> think even external drivers do this. However, there is a hard line between
>> the Neutron tables (even IPAM related ones) and the pluggable 

Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-02 Thread Kevin Benton
>Given that it shares the session, it wouldn't have to do anything. But,
again, it wouldn't behave like an external driver.

Why not? The only additional thing an external driver would be doing at
this step is calling an external system. Any local accounting in the DB
that driver would do would automatically be rolled back just like the
in-tree system.

Keep in mind that anything else can fail later in the transaction outside
of IPAM (e.g. ML2 driver precommit validation) and none of this IPAM
rollback logic will be called. Maybe the right answer is to get rid of the
IPAM rollback logic completely because it gives the wrong impression that
it is going to be called on all failures to commit when it's really only
called in failures inside of IPAM's module. Essentially every instance of
_safe_rollback in [1] is in an exception handler that isn't triggered if
there are exceptions anywhere in the core plugin after the initial base DB
calls.

1.
https://github.com/openstack/neutron/blob/master/neutron/db/ipam_pluggable_backend.py

On Aug 1, 2016 18:11, "Carl Baldwin"  wrote:

>
>
> On Mon, Aug 1, 2016 at 2:29 PM, Kevin Benton  wrote:
>
>> >We still want the exception to rollback the entire API operation and
>> stopping it with a nested operation I think would mess that up.
>>
>> Well I think you would want to start a nested transaction, capture the
>> duplicate, call the ipam delete methods, then throw a retryrequest. The
>> exception will still trigger a rollback of the entire operation.
>>
>
> This is kind of what I was headed when I decided to solicit some feedback.
> It is a possibility should still be considered.
>
>
>> >Second, I've been throwing around the idea of not sharing the session
>> with the IPAM driver.
>>
>> If the IPAM driver does not have access to the session, it can't see any
>> of the uncommitted data. Would that be a problem? In particular, doesn't
>> the IPAM driver's DB table have foreign key constraints with the data
>> waiting to be committed in the other session? I'm hesitant to take this
>> approach because it means other (if the in-tree doesn't already) IPAM
>> drivers cannot have any relational integrity with the objects in question.
>>
>
> The in-tree driver doesn't have any FK constraints back to the neutron db
> schema for IPAM [1]. I don't think that would make sense since it is
> supposed to work like an external driver.
>
>
>> A related question is, why does the in-tree IPAM driver have to do
>> anything at all on a rollback? It currently does share a session which is
>> automatically going to rollback all of it's DB operations for it. If it's
>> because the driver cannot distinguish a delete call from a rollback and a
>> normal delete, I suggest we change the delete call to pass a flag
>> indicating that it's for a rollback. That would allow any DB-based drivers
>> to just do nothing at this step.
>>
>
> Given that it shares the session, it wouldn't have to do anything. But,
> again, it wouldn't behave like an external driver. I'd like to not have
> special drivers that behave differently than drivers that are really
> external; we end up finding things that the in-tree driver does in our
> testing that doesn't work right for other drivers.
>
> Drivers might need to access uncommitted data from the neutron DB. I think
> even external drivers do this. However, there is a hard line between the
> Neutron tables (even IPAM related ones) and the pluggable IPAM driver
> database schema. I should have been a little more explicit that I wasn't
> suggesting that we hide the Neutron session from the driver. What I meant
> to suggest is that we use a different session for the part of the database
> schema that belongs solely to the driver. All of the changes would be
> inside the driver implementation and the interface to the driver wouldn't
> change at all.
>
> Carl
>
> [1]
> https://github.com/openstack/neutron/blob/2b1c143ca9/neutron/ipam/drivers/neutrondb_ipam/db_models.py
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-01 Thread Carl Baldwin
On Mon, Aug 1, 2016 at 2:29 PM, Kevin Benton  wrote:

> >We still want the exception to rollback the entire API operation and
> stopping it with a nested operation I think would mess that up.
>
> Well I think you would want to start a nested transaction, capture the
> duplicate, call the ipam delete methods, then throw a retryrequest. The
> exception will still trigger a rollback of the entire operation.
>

This is kind of what I was headed when I decided to solicit some feedback.
It is a possibility should still be considered.


> >Second, I've been throwing around the idea of not sharing the session
> with the IPAM driver.
>
> If the IPAM driver does not have access to the session, it can't see any
> of the uncommitted data. Would that be a problem? In particular, doesn't
> the IPAM driver's DB table have foreign key constraints with the data
> waiting to be committed in the other session? I'm hesitant to take this
> approach because it means other (if the in-tree doesn't already) IPAM
> drivers cannot have any relational integrity with the objects in question.
>

The in-tree driver doesn't have any FK constraints back to the neutron db
schema for IPAM [1]. I don't think that would make sense since it is
supposed to work like an external driver.


> A related question is, why does the in-tree IPAM driver have to do
> anything at all on a rollback? It currently does share a session which is
> automatically going to rollback all of it's DB operations for it. If it's
> because the driver cannot distinguish a delete call from a rollback and a
> normal delete, I suggest we change the delete call to pass a flag
> indicating that it's for a rollback. That would allow any DB-based drivers
> to just do nothing at this step.
>

Given that it shares the session, it wouldn't have to do anything. But,
again, it wouldn't behave like an external driver. I'd like to not have
special drivers that behave differently than drivers that are really
external; we end up finding things that the in-tree driver does in our
testing that doesn't work right for other drivers.

Drivers might need to access uncommitted data from the neutron DB. I think
even external drivers do this. However, there is a hard line between the
Neutron tables (even IPAM related ones) and the pluggable IPAM driver
database schema. I should have been a little more explicit that I wasn't
suggesting that we hide the Neutron session from the driver. What I meant
to suggest is that we use a different session for the part of the database
schema that belongs solely to the driver. All of the changes would be
inside the driver implementation and the interface to the driver wouldn't
change at all.

Carl

[1]
https://github.com/openstack/neutron/blob/2b1c143ca9/neutron/ipam/drivers/neutrondb_ipam/db_models.py
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-01 Thread Kevin Benton
>We still want the exception to rollback the entire API operation and
stopping it with a nested operation I think would mess that up.

Well I think you would want to start a nested transaction, capture the
duplicate, call the ipam delete methods, then throw a retryrequest. The
exception will still trigger a rollback of the entire operation.


>Second, I've been throwing around the idea of not sharing the session with
the IPAM driver.

If the IPAM driver does not have access to the session, it can't see any of
the uncommitted data. Would that be a problem? In particular, doesn't the
IPAM driver's DB table have foreign key constraints with the data waiting
to be committed in the other session? I'm hesitant to take this approach
because it means other (if the in-tree doesn't already) IPAM drivers cannot
have any relational integrity with the objects in question.

A related question is, why does the in-tree IPAM driver have to do anything
at all on a rollback? It currently does share a session which is
automatically going to rollback all of it's DB operations for it. If it's
because the driver cannot distinguish a delete call from a rollback and a
normal delete, I suggest we change the delete call to pass a flag
indicating that it's for a rollback. That would allow any DB-based drivers
to just do nothing at this step.



On Mon, Aug 1, 2016 at 12:28 PM, Carl Baldwin  wrote:

> Hi all,
>
> Last Thursday, I spent the afternoon looking in to a bug with pluggable
> IPAM [1] which is preventing me from deciding to pull the trigger on
> finally switching from the old non-pluggable reference implementation. I'd
> *really* like to get this in shape for Newton but time is running out.
>
> I've written a unit test [2] which manages to tickle the issue with
> rollback. It is a bit convoluted but basically it hijacks the call to
> _store_ip_allocation to reliably simulate another process racing for the
> same address and writes an allocation to the DB for the same ip address it
> is trying to allocate. The unit test also has to have something to rollback
> so it allocates two fixed ips to the port. The first one succeeds and then
> the second one fails (hence the initial call to "skip_one").
>
> I think the issue stems from the fact that the reference driver for
> pluggable IPAM shares the session with the rest of the API call. Writing
> the allocation fails with a DBDuplicate in the main part of the API call
> but then tries to rollback. Rollback fails because the session has already
> been "broken" by the duplicate exception.
>
> To fix this, I've thrown around a couple of ideas. First, I thought of
> maybe adding nested transactions in key places to isolate the part that is
> going to break. I could continue to pursue this but it is as
> straight-forward as it first seemed to me. We still want the exception to
> rollback the entire API operation and stopping it with a nested operation I
> think would mess that up.
>
> Second, I've been throwing around the idea of not sharing the session with
> the IPAM driver. I recall we had some discussion about this a long time ago
> but I cannot remember the conclusion. To me, it seems that the IPAM driver
> should not be sharing a session with the main API call. This would put it
> on par with other external IPAM drivers which would certainly not be
> sharing a DB context, let alone the DB itself, with the neutron API.
>
> Do you have any thoughts on this?
>
> Carl
>
> [1] https://bugs.launchpad.net/neutron/+bug/1603162
> [2] https://review.openstack.org/#/c/348956/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Neutron][IPAM] Pluggable IPAM rollback issue

2016-08-01 Thread Carl Baldwin
Hi all,

Last Thursday, I spent the afternoon looking in to a bug with pluggable
IPAM [1] which is preventing me from deciding to pull the trigger on
finally switching from the old non-pluggable reference implementation. I'd
*really* like to get this in shape for Newton but time is running out.

I've written a unit test [2] which manages to tickle the issue with
rollback. It is a bit convoluted but basically it hijacks the call to
_store_ip_allocation to reliably simulate another process racing for the
same address and writes an allocation to the DB for the same ip address it
is trying to allocate. The unit test also has to have something to rollback
so it allocates two fixed ips to the port. The first one succeeds and then
the second one fails (hence the initial call to "skip_one").

I think the issue stems from the fact that the reference driver for
pluggable IPAM shares the session with the rest of the API call. Writing
the allocation fails with a DBDuplicate in the main part of the API call
but then tries to rollback. Rollback fails because the session has already
been "broken" by the duplicate exception.

To fix this, I've thrown around a couple of ideas. First, I thought of
maybe adding nested transactions in key places to isolate the part that is
going to break. I could continue to pursue this but it is as
straight-forward as it first seemed to me. We still want the exception to
rollback the entire API operation and stopping it with a nested operation I
think would mess that up.

Second, I've been throwing around the idea of not sharing the session with
the IPAM driver. I recall we had some discussion about this a long time ago
but I cannot remember the conclusion. To me, it seems that the IPAM driver
should not be sharing a session with the main API call. This would put it
on par with other external IPAM drivers which would certainly not be
sharing a DB context, let alone the DB itself, with the neutron API.

Do you have any thoughts on this?

Carl

[1] https://bugs.launchpad.net/neutron/+bug/1603162
[2] https://review.openstack.org/#/c/348956/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev