Re: [openstack-dev] [neutron][L3][QA] DVR job failure rate and maintainability

2015-09-15 Thread shihanzhang
Sean, 
Thank you very much for writing this, DVR indeed need to get more attention, 
it's a very cool and usefull feature, especially in large-scale. In Juno, it 
firstly lands to Neutron, through the development of Kilo and Liberty, it's 
getting better and better, we have used it in our production,
in the process of use, we found the following bugs have not been fixed, we have 
filed bug on launchpad:
1. every time we create a VM, it will trigger router scheduling, in 
large-scale, if there are lage l3 agents bind to a DVR router, scheduling 
router consume much time, but scheduling action is not necessary.[1]
2. every time we bind a VM with floatingIP, it also trigger router scheduling, 
and send this floatingIP to all bound
l3 agents.[2]
3. Bulk delete VMs from a compute node which has no VM on this router, for most 
part, the router namespace will remain.[3]
4. Updating router_gateway trigger reschedule_router, during reschedule_router, 
the communication is broken related to this router, for DVR router, why router 
need to reschedule_router? it reschedule which l3 agents? [4]
5. Stale fip namespaces are not cleaned up on compute nodes. [5]


I very agree with that we need a group of contributors that
can help with the DVR feature in the immediate term to fix the current bugs.
I am very glad to join this group.


Neutroner, let's start to do the great things!


Thanks,
Hanzhang,Shi


[1] https://bugs.launchpad.net/neutron/+bug/1486795
[2]https://bugs.launchpad.net/neutron/+bug/1486828
[3] https://bugs.launchpad.net/neutron/+bug/1496201
[4] https://bugs.launchpad.net/neutron/+bug/1496204
[5] https://bugs.launchpad.net/neutron/+bug/1470909







At 2015-09-15 06:01:03, "Sean M. Collins"  wrote:
>[adding neutron tag to subject and resending]
>
>Hi,
>
>Carl Baldwin, Doug Wiegley, Matt Kassawara, Ryan Moats, and myself are
>at the QA sprint in Fort Collins. Earlier today there was a discussion
>about the failure rate about the DVR job, and the possible impact that
>it is having on the gate.
>
>Ryan has a good patch up that shows the failure rates over time:
>
>https://review.openstack.org/223201
>
>To view the graphs, you go over into your neutron git repo, and open the
>.html files that are present in doc/dashboards - which should open up
>your browser and display the Graphite query.
>
>Doug put up a patch to change the DVR job to be non-voting while we
>determine the cause of the recent spikes:
>
>https://review.openstack.org/223173
>
>There was a good discussion after pushing the patch, revolving around
>the need for Neutron to have DVR, to fit operational and reliability
>requirements, and help transition away from Nova-Network by providing
>one of many solutions similar to Nova's multihost feature.  I'm skipping
>over a huge amount of context about the Nova-Network and Neutron work,
>since that is a big and ongoing effort. 
>
>DVR is an important feature to have, and we need to ensure that the job
>that tests DVR has a high pass rate.
>
>One thing that I think we need, is to form a group of contributors that
>can help with the DVR feature in the immediate term to fix the current
>bugs, and longer term maintain the feature. It's a big task and I don't
>believe that a single person or company can or should do it by themselves.
>
>The L3 group is a good place to start, but I think that even within the
>L3 team we need dedicated and diverse group of people who are interested
>in maintaining the DVR feature. 
>
>Without this, I think the DVR feature will start to bit-rot and that
>will have a significant impact on our ability to recommend Neutron as a
>replacement for Nova-Network in the future.
>
>-- 
>Sean M. Collins
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][L3][QA] DVR job failure rate and maintainability

2015-09-15 Thread Carl Baldwin
Sean,

Thank you for writing this.  It is clear that we have some work to do
and we need more attention on this.  We were able to get the job
voting a few months ago when the failure rates for all the jobs were
at a low point.  However, we never really addressed the fact that this
job has always had a little bit higher rate than its non-DVR
counter-part.  DVR is a supported feature now and we need to be behind
it.

I'm adding this to the agenda for the L3 meeting this Thursday [1].
Let's dedicate real talent and time to getting to the bottom of the
higher failure rate, driving the bugs out, and making the enhancements
needed to make this feature what it should be.

Carl

[1] https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam

On Mon, Sep 14, 2015 at 4:01 PM, Sean M. Collins  wrote:
> Hi,
>
> Carl Baldwin, Doug Wiegley, Matt Kassawara, Ryan Moats, and myself are
> at the QA sprint in Fort Collins. Earlier today there was a discussion
> about the failure rate about the DVR job, and the possible impact that
> it is having on the gate.
>
> Ryan has a good patch up that shows the failure rates over time:
>
> https://review.openstack.org/223201
>
> To view the graphs, you go over into your neutron git repo, and open the
> .html files that are present in doc/dashboards - which should open up
> your browser and display the Graphite query.
>
> Doug put up a patch to change the DVR job to be non-voting while we
> determine the cause of the recent spikes:
>
> https://review.openstack.org/223173
>
> There was a good discussion after pushing the patch, revolving around
> the need for Neutron to have DVR, to fit operational and reliability
> requirements, and help transition away from Nova-Network by providing
> one of many solutions similar to Nova's multihost feature.  I'm skipping
> over a huge amount of context about the Nova-Network and Neutron work,
> since that is a big and ongoing effort.
>
> DVR is an important feature to have, and we need to ensure that the job
> that tests DVR has a high pass rate.
>
> One thing that I think we need, is to form a group of contributors that
> can help with the DVR feature in the immediate term to fix the current
> bugs, and longer term maintain the feature. It's a big task and I don't
> believe that a single person or company can or should do it by themselves.
>
> The L3 group is a good place to start, but I think that even within the
> L3 team we need dedicated and diverse group of people who are interested
> in maintaining the DVR feature.
>
> Without this, I think the DVR feature will start to bit-rot and that
> will have a significant impact on our ability to recommend Neutron as a
> replacement for Nova-Network in the future.
>
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][L3][QA] DVR job failure rate and maintainability

2015-09-15 Thread Ryan Moats
I couldn't have said it better, Sean.

Ryan Moats

"Sean M. Collins" <s...@coreitpro.com> wrote on 09/14/2015 05:01:03 PM:

> From: "Sean M. Collins" <s...@coreitpro.com>
> To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev@lists.openstack.org>
> Date: 09/14/2015 05:01 PM
> Subject: [openstack-dev] [neutron][L3][QA] DVR job failure rate and
> maintainability
>
> [adding neutron tag to subject and resending]
>
> Hi,
>
> Carl Baldwin, Doug Wiegley, Matt Kassawara, Ryan Moats, and myself are
> at the QA sprint in Fort Collins. Earlier today there was a discussion
> about the failure rate about the DVR job, and the possible impact that
> it is having on the gate.
>
> Ryan has a good patch up that shows the failure rates over time:
>
> https://review.openstack.org/223201
>
> To view the graphs, you go over into your neutron git repo, and open the
> .html files that are present in doc/dashboards - which should open up
> your browser and display the Graphite query.
>
> Doug put up a patch to change the DVR job to be non-voting while we
> determine the cause of the recent spikes:
>
> https://review.openstack.org/223173
>
> There was a good discussion after pushing the patch, revolving around
> the need for Neutron to have DVR, to fit operational and reliability
> requirements, and help transition away from Nova-Network by providing
> one of many solutions similar to Nova's multihost feature.  I'm skipping
> over a huge amount of context about the Nova-Network and Neutron work,
> since that is a big and ongoing effort.
>
> DVR is an important feature to have, and we need to ensure that the job
> that tests DVR has a high pass rate.
>
> One thing that I think we need, is to form a group of contributors that
> can help with the DVR feature in the immediate term to fix the current
> bugs, and longer term maintain the feature. It's a big task and I don't
> believe that a single person or company can or should do it by
themselves.
>
> The L3 group is a good place to start, but I think that even within the
> L3 team we need dedicated and diverse group of people who are interested
> in maintaining the DVR feature.
>
> Without this, I think the DVR feature will start to bit-rot and that
> will have a significant impact on our ability to recommend Neutron as a
> replacement for Nova-Network in the future.
>
> --
> Sean M. Collins
>
>
__
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron][L3][QA] DVR job failure rate and maintainability

2015-09-14 Thread Sean M. Collins
[adding neutron tag to subject and resending]

Hi,

Carl Baldwin, Doug Wiegley, Matt Kassawara, Ryan Moats, and myself are
at the QA sprint in Fort Collins. Earlier today there was a discussion
about the failure rate about the DVR job, and the possible impact that
it is having on the gate.

Ryan has a good patch up that shows the failure rates over time:

https://review.openstack.org/223201

To view the graphs, you go over into your neutron git repo, and open the
.html files that are present in doc/dashboards - which should open up
your browser and display the Graphite query.

Doug put up a patch to change the DVR job to be non-voting while we
determine the cause of the recent spikes:

https://review.openstack.org/223173

There was a good discussion after pushing the patch, revolving around
the need for Neutron to have DVR, to fit operational and reliability
requirements, and help transition away from Nova-Network by providing
one of many solutions similar to Nova's multihost feature.  I'm skipping
over a huge amount of context about the Nova-Network and Neutron work,
since that is a big and ongoing effort. 

DVR is an important feature to have, and we need to ensure that the job
that tests DVR has a high pass rate.

One thing that I think we need, is to form a group of contributors that
can help with the DVR feature in the immediate term to fix the current
bugs, and longer term maintain the feature. It's a big task and I don't
believe that a single person or company can or should do it by themselves.

The L3 group is a good place to start, but I think that even within the
L3 team we need dedicated and diverse group of people who are interested
in maintaining the DVR feature. 

Without this, I think the DVR feature will start to bit-rot and that
will have a significant impact on our ability to recommend Neutron as a
replacement for Nova-Network in the future.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev