Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-07 Thread Isaku Yamahata
Mathieu, Thank you for clarification.
I'll take a look at the patches.

On Tue, Jan 07, 2014 at 02:34:24PM +0100,
Salvatore Orlando  wrote:

> Thanks Mathieu!
> 
> I think we should first merge Edouard's patch, which appears to be a
> prerequisite.
> I think we could benefit a lot by applying this mechanism to
> process_network_ports.
> 
> However, I am not sure if there could be drawbacks arising from the fact
> that the agent would assign the local VLAN port (either the lvm id or the
> DEAD_VLAN tag) and then at the end of the iteration the flow modifications,
> such as the drop all rule, will be applied.
> This will probably create a short interval of time in which we might have
> unexpected behaviours (such as VMs on DEAD VLAN able to communicate each
> other for instance).

Agree that more careful ordered update is necessary with deferred
application.

Thanks,
Isaku Yamahata


> I think we can generalize this discussion and use deferred application for
> ovs-vsctl as well.
> Would you agree with that?
>
> Thanks,
> Salvatore
> 
> 
> On 7 January 2014 14:08, Mathieu Rohon  wrote:
> 
> > I think that isaku is talking about a more intensive usage of
> > defer_apply_on/off as it is done in the patch of gongysh [1].
> >
> > Isaku, i don't see any reason why this could not be done in
> > precess_network_ports, if needed. Moreover the patch from edouard [2]
> > resolves multithreading issues while precessing defer_apply_off.
> >
> >
> > [1]https://review.openstack.org/#/c/61341/
> > [2]https://review.openstack.org/#/c/63917/
> >
> > On Mon, Jan 6, 2014 at 9:24 PM, Salvatore Orlando 
> > wrote:
> > > This thread is starting to get a bit confusing, at least for people with
> > a
> > > single-pipeline brain like me!
> > >
> > > I am not entirely sure if I understand correctly Isaku's proposal
> > concerning
> > > deferring the application of flow changes.
> > > I think it's worth discussing in a separate thread, and a supporting
> > patch
> > > will help as well; I think that in order to avoid unexpected behaviours,
> > > vlan tagging on the port and flow setup should always be performed at the
> > > same time; if we get a much better performance using a mechanism similar
> > to
> > > iptables' defer_apply, then we should it.
> > >
> > > Regarding rootwrap. This 6x slowdown, while proving that rootwrap
> > imposes a
> > > non-negligible overhead, it should not be used as a sort of proof that
> > > rootwrap makes things 6 times worse! What I've been seeing on the gate
> > and
> > > in my tests are ALRM_CLOCK errors raised by ovs commands, so rootwrap has
> > > little to do with it.
> > >
> > > Still, I think we can say that rootwrap adds about 50ms to each command,
> > > becoming particularly penalising especially for 'fast' commands.
> > > I think the best things to do, as Joe advices, a test with rootwrap
> > disabled
> > > on the gate - and I will take care of that.
> > >
> > > On the other hand, I would invite community members picking up some of
> > the
> > > bugs we've registered for 'less frequent' failures observed during
> > parallel
> > > testing; especially if you're coming to Montreal next week.
> > >
> > > Salvatore
> > >
> > >
> > >
> > > On 6 January 2014 20:31, Jay Pipes  wrote:
> > >>
> > >> On Mon, 2014-01-06 at 11:17 -0800, Joe Gordon wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes 
> > wrote:
> > >> > On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
> > >> >
> > >> > > What about it? Also those numbers are pretty old at this
> > >> > point. I was
> > >> > > thinking disable rootwrap and run full parallel tempest
> > >> > against it.
> > >> >
> > >> >
> > >> > I think that is a little overkill for what we're trying to do
> > >> > here. We
> > >> > are specifically talking about combining many utils.execute()
> > >> > calls into
> > >> > a single one. I think it's pretty obvious that the latter will
> > >> > be better
> > >> > performing than the first, unless you think that rootwrap has
> > >> > no
> > >> > performance overhead at all?
> > >> >
> > >> >
> > >> > mocking out rootwrap with straight sudo, is a very quick way to
> > >> > approximate the performance benefit of combining many utlils.execute()
> > >> > calls together (at least rootwrap wise).  Also  it would tell us how
> > >> > much of the problem is rootwrap induced and how much is other.
> > >>
> > >> Yes, I understand that, which is what the article I linked earlier
> > >> showed?
> > >>
> > >> % time sudo ip link >/dev/null
> > >> sudo ip link > /dev/null  0.00s user 0.00s system 43% cpu 0.009 total
> > >> % sudo time quantum-rootwrap /etc/quantum/rootwrap.conf ip link
> > >> > /dev/null
> > >> quantum-rootwrap /etc/quantum/rootwrap.conf ip link  > /dev/null  0.04s
> > >> user 0.02s system 87% cpu 0.059 total
> > >>
> > >> A very tiny, non-scientific simple indication that root

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-07 Thread Salvatore Orlando
Thanks Mathieu!

I think we should first merge Edouard's patch, which appears to be a
prerequisite.
I think we could benefit a lot by applying this mechanism to
process_network_ports.

However, I am not sure if there could be drawbacks arising from the fact
that the agent would assign the local VLAN port (either the lvm id or the
DEAD_VLAN tag) and then at the end of the iteration the flow modifications,
such as the drop all rule, will be applied.
This will probably create a short interval of time in which we might have
unexpected behaviours (such as VMs on DEAD VLAN able to communicate each
other for instance).

I think we can generalize this discussion and use deferred application for
ovs-vsctl as well.
Would you agree with that?

Thanks,
Salvatore


On 7 January 2014 14:08, Mathieu Rohon  wrote:

> I think that isaku is talking about a more intensive usage of
> defer_apply_on/off as it is done in the patch of gongysh [1].
>
> Isaku, i don't see any reason why this could not be done in
> precess_network_ports, if needed. Moreover the patch from edouard [2]
> resolves multithreading issues while precessing defer_apply_off.
>
>
> [1]https://review.openstack.org/#/c/61341/
> [2]https://review.openstack.org/#/c/63917/
>
> On Mon, Jan 6, 2014 at 9:24 PM, Salvatore Orlando 
> wrote:
> > This thread is starting to get a bit confusing, at least for people with
> a
> > single-pipeline brain like me!
> >
> > I am not entirely sure if I understand correctly Isaku's proposal
> concerning
> > deferring the application of flow changes.
> > I think it's worth discussing in a separate thread, and a supporting
> patch
> > will help as well; I think that in order to avoid unexpected behaviours,
> > vlan tagging on the port and flow setup should always be performed at the
> > same time; if we get a much better performance using a mechanism similar
> to
> > iptables' defer_apply, then we should it.
> >
> > Regarding rootwrap. This 6x slowdown, while proving that rootwrap
> imposes a
> > non-negligible overhead, it should not be used as a sort of proof that
> > rootwrap makes things 6 times worse! What I've been seeing on the gate
> and
> > in my tests are ALRM_CLOCK errors raised by ovs commands, so rootwrap has
> > little to do with it.
> >
> > Still, I think we can say that rootwrap adds about 50ms to each command,
> > becoming particularly penalising especially for 'fast' commands.
> > I think the best things to do, as Joe advices, a test with rootwrap
> disabled
> > on the gate - and I will take care of that.
> >
> > On the other hand, I would invite community members picking up some of
> the
> > bugs we've registered for 'less frequent' failures observed during
> parallel
> > testing; especially if you're coming to Montreal next week.
> >
> > Salvatore
> >
> >
> >
> > On 6 January 2014 20:31, Jay Pipes  wrote:
> >>
> >> On Mon, 2014-01-06 at 11:17 -0800, Joe Gordon wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes 
> wrote:
> >> > On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
> >> >
> >> > > What about it? Also those numbers are pretty old at this
> >> > point. I was
> >> > > thinking disable rootwrap and run full parallel tempest
> >> > against it.
> >> >
> >> >
> >> > I think that is a little overkill for what we're trying to do
> >> > here. We
> >> > are specifically talking about combining many utils.execute()
> >> > calls into
> >> > a single one. I think it's pretty obvious that the latter will
> >> > be better
> >> > performing than the first, unless you think that rootwrap has
> >> > no
> >> > performance overhead at all?
> >> >
> >> >
> >> > mocking out rootwrap with straight sudo, is a very quick way to
> >> > approximate the performance benefit of combining many utlils.execute()
> >> > calls together (at least rootwrap wise).  Also  it would tell us how
> >> > much of the problem is rootwrap induced and how much is other.
> >>
> >> Yes, I understand that, which is what the article I linked earlier
> >> showed?
> >>
> >> % time sudo ip link >/dev/null
> >> sudo ip link > /dev/null  0.00s user 0.00s system 43% cpu 0.009 total
> >> % sudo time quantum-rootwrap /etc/quantum/rootwrap.conf ip link
> >> > /dev/null
> >> quantum-rootwrap /etc/quantum/rootwrap.conf ip link  > /dev/null  0.04s
> >> user 0.02s system 87% cpu 0.059 total
> >>
> >> A very tiny, non-scientific simple indication that rootwrap is around 6
> >> times slower than a simple sudo call.
> >>
> >> Best,
> >> -jay
> >>
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/ope

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-07 Thread Mathieu Rohon
I think that isaku is talking about a more intensive usage of
defer_apply_on/off as it is done in the patch of gongysh [1].

Isaku, i don't see any reason why this could not be done in
precess_network_ports, if needed. Moreover the patch from edouard [2]
resolves multithreading issues while precessing defer_apply_off.


[1]https://review.openstack.org/#/c/61341/
[2]https://review.openstack.org/#/c/63917/

On Mon, Jan 6, 2014 at 9:24 PM, Salvatore Orlando  wrote:
> This thread is starting to get a bit confusing, at least for people with a
> single-pipeline brain like me!
>
> I am not entirely sure if I understand correctly Isaku's proposal concerning
> deferring the application of flow changes.
> I think it's worth discussing in a separate thread, and a supporting patch
> will help as well; I think that in order to avoid unexpected behaviours,
> vlan tagging on the port and flow setup should always be performed at the
> same time; if we get a much better performance using a mechanism similar to
> iptables' defer_apply, then we should it.
>
> Regarding rootwrap. This 6x slowdown, while proving that rootwrap imposes a
> non-negligible overhead, it should not be used as a sort of proof that
> rootwrap makes things 6 times worse! What I've been seeing on the gate and
> in my tests are ALRM_CLOCK errors raised by ovs commands, so rootwrap has
> little to do with it.
>
> Still, I think we can say that rootwrap adds about 50ms to each command,
> becoming particularly penalising especially for 'fast' commands.
> I think the best things to do, as Joe advices, a test with rootwrap disabled
> on the gate - and I will take care of that.
>
> On the other hand, I would invite community members picking up some of the
> bugs we've registered for 'less frequent' failures observed during parallel
> testing; especially if you're coming to Montreal next week.
>
> Salvatore
>
>
>
> On 6 January 2014 20:31, Jay Pipes  wrote:
>>
>> On Mon, 2014-01-06 at 11:17 -0800, Joe Gordon wrote:
>> >
>> >
>> >
>> > On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes  wrote:
>> > On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
>> >
>> > > What about it? Also those numbers are pretty old at this
>> > point. I was
>> > > thinking disable rootwrap and run full parallel tempest
>> > against it.
>> >
>> >
>> > I think that is a little overkill for what we're trying to do
>> > here. We
>> > are specifically talking about combining many utils.execute()
>> > calls into
>> > a single one. I think it's pretty obvious that the latter will
>> > be better
>> > performing than the first, unless you think that rootwrap has
>> > no
>> > performance overhead at all?
>> >
>> >
>> > mocking out rootwrap with straight sudo, is a very quick way to
>> > approximate the performance benefit of combining many utlils.execute()
>> > calls together (at least rootwrap wise).  Also  it would tell us how
>> > much of the problem is rootwrap induced and how much is other.
>>
>> Yes, I understand that, which is what the article I linked earlier
>> showed?
>>
>> % time sudo ip link >/dev/null
>> sudo ip link > /dev/null  0.00s user 0.00s system 43% cpu 0.009 total
>> % sudo time quantum-rootwrap /etc/quantum/rootwrap.conf ip link
>> > /dev/null
>> quantum-rootwrap /etc/quantum/rootwrap.conf ip link  > /dev/null  0.04s
>> user 0.02s system 87% cpu 0.059 total
>>
>> A very tiny, non-scientific simple indication that rootwrap is around 6
>> times slower than a simple sudo call.
>>
>> Best,
>> -jay
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 21:24 +0100, Salvatore Orlando wrote:
> This thread is starting to get a bit confusing, at least for people
> with a single-pipeline brain like me!

Heh, point taken.

> 

> On the other hand, I would invite community members picking up some of
> the bugs we've registered for 'less frequent' failures observed during
> parallel testing; especially if you're coming to Montreal next week.

Mind tagging the bugs with 'less-frequent' or even 'montreal', so we can
quickly search launchpad for them?

Thanks!
-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Salvatore Orlando
This thread is starting to get a bit confusing, at least for people with a
single-pipeline brain like me!

I am not entirely sure if I understand correctly Isaku's proposal
concerning deferring the application of flow changes.
I think it's worth discussing in a separate thread, and a supporting patch
will help as well; I think that in order to avoid unexpected behaviours,
vlan tagging on the port and flow setup should always be performed at the
same time; if we get a much better performance using a mechanism similar to
iptables' defer_apply, then we should it.

Regarding rootwrap. This 6x slowdown, while proving that rootwrap imposes a
non-negligible overhead, it should not be used as a sort of proof that
rootwrap makes things 6 times worse! What I've been seeing on the gate and
in my tests are ALRM_CLOCK errors raised by ovs commands, so rootwrap has
little to do with it.

Still, I think we can say that rootwrap adds about 50ms to each command,
becoming particularly penalising especially for 'fast' commands.
I think the best things to do, as Joe advices, a test with rootwrap
disabled on the gate - and I will take care of that.

On the other hand, I would invite community members picking up some of the
bugs we've registered for 'less frequent' failures observed during parallel
testing; especially if you're coming to Montreal next week.

Salvatore



On 6 January 2014 20:31, Jay Pipes  wrote:

> On Mon, 2014-01-06 at 11:17 -0800, Joe Gordon wrote:
> >
> >
> >
> > On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes  wrote:
> > On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
> >
> > > What about it? Also those numbers are pretty old at this
> > point. I was
> > > thinking disable rootwrap and run full parallel tempest
> > against it.
> >
> >
> > I think that is a little overkill for what we're trying to do
> > here. We
> > are specifically talking about combining many utils.execute()
> > calls into
> > a single one. I think it's pretty obvious that the latter will
> > be better
> > performing than the first, unless you think that rootwrap has
> > no
> > performance overhead at all?
> >
> >
> > mocking out rootwrap with straight sudo, is a very quick way to
> > approximate the performance benefit of combining many utlils.execute()
> > calls together (at least rootwrap wise).  Also  it would tell us how
> > much of the problem is rootwrap induced and how much is other.
>
> Yes, I understand that, which is what the article I linked earlier
> showed?
>
> % time sudo ip link >/dev/null
> sudo ip link > /dev/null  0.00s user 0.00s system 43% cpu 0.009 total
> % sudo time quantum-rootwrap /etc/quantum/rootwrap.conf ip link
> > /dev/null
> quantum-rootwrap /etc/quantum/rootwrap.conf ip link  > /dev/null  0.04s
> user 0.02s system 87% cpu 0.059 total
>
> A very tiny, non-scientific simple indication that rootwrap is around 6
> times slower than a simple sudo call.
>
> Best,
> -jay
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 11:17 -0800, Joe Gordon wrote:
> 
> 
> 
> On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes  wrote:
> On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
> 
> > What about it? Also those numbers are pretty old at this
> point. I was
> > thinking disable rootwrap and run full parallel tempest
> against it.
> 
> 
> I think that is a little overkill for what we're trying to do
> here. We
> are specifically talking about combining many utils.execute()
> calls into
> a single one. I think it's pretty obvious that the latter will
> be better
> performing than the first, unless you think that rootwrap has
> no
> performance overhead at all?
> 
> 
> mocking out rootwrap with straight sudo, is a very quick way to
> approximate the performance benefit of combining many utlils.execute()
> calls together (at least rootwrap wise).  Also  it would tell us how
> much of the problem is rootwrap induced and how much is other.

Yes, I understand that, which is what the article I linked earlier
showed?

% time sudo ip link >/dev/null
sudo ip link > /dev/null  0.00s user 0.00s system 43% cpu 0.009 total
% sudo time quantum-rootwrap /etc/quantum/rootwrap.conf ip link
> /dev/null
quantum-rootwrap /etc/quantum/rootwrap.conf ip link  > /dev/null  0.04s
user 0.02s system 87% cpu 0.059 total

A very tiny, non-scientific simple indication that rootwrap is around 6
times slower than a simple sudo call.

Best,
-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Joe Gordon
On Mon, Jan 6, 2014 at 10:35 AM, Jay Pipes  wrote:

> On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:
>
> > What about it? Also those numbers are pretty old at this point. I was
> > thinking disable rootwrap and run full parallel tempest against it.
>
> I think that is a little overkill for what we're trying to do here. We
> are specifically talking about combining many utils.execute() calls into
> a single one. I think it's pretty obvious that the latter will be better
> performing than the first, unless you think that rootwrap has no
> performance overhead at all?
>

mocking out rootwrap with straight sudo, is a very quick way to approximate
the performance benefit of combining many utlils.execute() calls together
(at least rootwrap wise).  Also  it would tell us how much of the problem
is rootwrap induced and how much is other.


>
> -jay
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 09:56 -0800, Joe Gordon wrote:

> What about it? Also those numbers are pretty old at this point. I was
> thinking disable rootwrap and run full parallel tempest against it.

I think that is a little overkill for what we're trying to do here. We
are specifically talking about combining many utils.execute() calls into
a single one. I think it's pretty obvious that the latter will be better
performing than the first, unless you think that rootwrap has no
performance overhead at all?

-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Joe Gordon
On Mon, Jan 6, 2014 at 9:37 AM, Jay Pipes  wrote:

> On Mon, 2014-01-06 at 08:58 -0800, Joe Gordon wrote:
>
> > On Mon, Jan 6, 2014 at 8:38 AM, Jay Pipes  wrote:
> > On Mon, 2014-01-06 at 17:04 +0100, Salvatore Orlando wrote:
> > > I have already discussed the matter with Jay on IRC, even if
> > for a
> > > different issue.
> > > In this specific case 'batching' will have the benefit of
> > reducing the
> > > rootwrap overhead.
> >
> >
> > Right.
> >
> > > However, it seems the benefit from batching is not
> > resolutive. I admit
> > > I have not run tests in the gate with batching; I've just
> > tested in an
> > > environment without significant load, obtaining a
> > performance increase
> > > of less than 10%.
> >
> >
> > Well, 10% is 10% better than nothing ;) And add in the
> > (significant)
> > rootwrap costs, and I think it's certainly worth looking into.
> >
> > Have you tried running neutron without rootwrap, to get a baseline?
>
> See:
>
> http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance
>
> Specifically the section titled "Disable rootwrap scripts",
>
>

What about it? Also those numbers are pretty old at this point. I was
thinking disable rootwrap and run full parallel tempest against it.



> Best,
> -jay
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 08:58 -0800, Joe Gordon wrote:

> On Mon, Jan 6, 2014 at 8:38 AM, Jay Pipes  wrote:
> On Mon, 2014-01-06 at 17:04 +0100, Salvatore Orlando wrote:
> > I have already discussed the matter with Jay on IRC, even if
> for a
> > different issue.
> > In this specific case 'batching' will have the benefit of
> reducing the
> > rootwrap overhead.
> 
> 
> Right.
> 
> > However, it seems the benefit from batching is not
> resolutive. I admit
> > I have not run tests in the gate with batching; I've just
> tested in an
> > environment without significant load, obtaining a
> performance increase
> > of less than 10%.
> 
> 
> Well, 10% is 10% better than nothing ;) And add in the
> (significant)
> rootwrap costs, and I think it's certainly worth looking into.
> 
> Have you tried running neutron without rootwrap, to get a baseline?

See:
http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance

Specifically the section titled "Disable rootwrap scripts",

Best,
-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Salvatore Orlando
I have already discussed the matter with Jay on IRC, even if for a
different issue.
In this specific case 'batching' will have the benefit of reducing the
rootwrap overhead.

However, it seems the benefit from batching is not resolutive. I admit I
have not run tests in the gate with batching; I've just tested in an
environment without significant load, obtaining a performance increase of
less than 10%.

>From what I gathered even if commands are 'batched' to ovs-vsctl,
operations are still individually performed on the kernel module. I did not
investigate whether the cli commands sends a single or multiple commands on
the ovsdb interface.
Nevertheless, another thing to note is that it's not just ovs-vsctl that
becomes very slow, but also, and more often than that, ovs-ofctl, for which
there is no batching.

Summarising, I'm not opposed to batching for ovs-vsctl, and I would
definitely welcome it; I just don't think it will be the ultimate solution.

Salvatore


On 6 January 2014 11:40, Isaku Yamahata  wrote:

> On Fri, Dec 27, 2013 at 11:09:02AM +0100,
> Salvatore Orlando  wrote:
>
> > Hi,
> >
> > We now have several patches under review which improve a lot how neutron
> > handles parallel testing.
> > In a nutshell, these patches try to ensure the ovs agent processes new,
> > removed, and updated interfaces as soon as possible,
> >
> > These patches are:
> > https://review.openstack.org/#/c/61105/
> > https://review.openstack.org/#/c/61964/
> > https://review.openstack.org/#/c/63100/
> > https://review.openstack.org/#/c/63558/
> >
> > There is still room for improvement. For instance the calls from the
> agent
> > into the plugins might be consistently reduced.
> > However, even if the above patches shrink a lot the time required for
> > processing a device, we are still hitting a hard limit with the execution
> > ovs commands for setting local vlan tags and clearing flows (or adding
> the
> > flow rule for dropping all the traffic).
> > In some instances this commands slow down a lot, requiring almost 10
> > seconds to complete. This adds a delay in interface processing which in
> > some cases leads to the hideous SSH timeout error (the same we see with
> bug
> > 1253896 in normal testing).
> > It is also worth noting that when this happens sysstat reveal CPU usage
> is
> > very close to 100%
> >
> > From the neutron side there is little we can do. Introducing parallel
> > processing for interface, as we do for the l3 agent, is not actually a
> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is
> not
> > multithreaded. If you think the situation might be improved by changing
> the
> > logic for handling local vlan tags and putting ports on the dead vlan, I
> > would be happy to talk about that.
>
> How about batching those ovsdb operations?
> Instead of issueing many ovs-vsctl command,
> ovs-vsctl -- command0 [args] -- command1 [args] -- ...
>
> Then, the number of ovs-vsctl will be reduced and ovs-vsctl issues
> only single ovsdb transaction.
> --
> Isaku Yamahata 
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Isaku Yamahata
On Mon, Jan 06, 2014 at 05:04:47PM +0100,
Salvatore Orlando  wrote:

> I have already discussed the matter with Jay on IRC, even if for a
> different issue.
> In this specific case 'batching' will have the benefit of reducing the
> rootwrap overhead.
> 
> However, it seems the benefit from batching is not resolutive. I admit I
> have not run tests in the gate with batching; I've just tested in an
> environment without significant load, obtaining a performance increase of
> less than 10%.
> 
> From what I gathered even if commands are 'batched' to ovs-vsctl,
> operations are still individually performed on the kernel module. I did not
> investigate whether the cli commands sends a single or multiple commands on
> the ovsdb interface.
> Nevertheless, another thing to note is that it's not just ovs-vsctl that
> becomes very slow, but also, and more often than that, ovs-ofctl, for which
> there is no batching.

Then ovs-ofctl add/mod-flows SWITCH FILE will help on defer_apply_off()?
If yes, I'm willing to create such patch.
add/mod-flows batches add/mod-flow. ovs-ofctl sends OF barrier message
and wait for its reply to confirm the result.
Single barrier synchronization of add/mod-flows vs each barrier
synchronizations of add/mod-flow.

The current implementation doesn't have defer_apply_on/off in
process_network_ports(). Is there any reason for it?

Thanks,
Isaku Yamahata


> Summarising, I'm not opposed to batching for ovs-vsctl, and I would
> definitely welcome it; I just don't think it will be the ultimate solution.
> 
> Salvatore
> 
> 
> On 6 January 2014 11:40, Isaku Yamahata  wrote:
> 
> > On Fri, Dec 27, 2013 at 11:09:02AM +0100,
> > Salvatore Orlando  wrote:
> >
> > > Hi,
> > >
> > > We now have several patches under review which improve a lot how neutron
> > > handles parallel testing.
> > > In a nutshell, these patches try to ensure the ovs agent processes new,
> > > removed, and updated interfaces as soon as possible,
> > >
> > > These patches are:
> > > https://review.openstack.org/#/c/61105/
> > > https://review.openstack.org/#/c/61964/
> > > https://review.openstack.org/#/c/63100/
> > > https://review.openstack.org/#/c/63558/
> > >
> > > There is still room for improvement. For instance the calls from the
> > agent
> > > into the plugins might be consistently reduced.
> > > However, even if the above patches shrink a lot the time required for
> > > processing a device, we are still hitting a hard limit with the execution
> > > ovs commands for setting local vlan tags and clearing flows (or adding
> > the
> > > flow rule for dropping all the traffic).
> > > In some instances this commands slow down a lot, requiring almost 10
> > > seconds to complete. This adds a delay in interface processing which in
> > > some cases leads to the hideous SSH timeout error (the same we see with
> > bug
> > > 1253896 in normal testing).
> > > It is also worth noting that when this happens sysstat reveal CPU usage
> > is
> > > very close to 100%
> > >
> > > From the neutron side there is little we can do. Introducing parallel
> > > processing for interface, as we do for the l3 agent, is not actually a
> > > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is
> > not
> > > multithreaded. If you think the situation might be improved by changing
> > the
> > > logic for handling local vlan tags and putting ports on the dead vlan, I
> > > would be happy to talk about that.
> >
> > How about batching those ovsdb operations?
> > Instead of issueing many ovs-vsctl command,
> > ovs-vsctl -- command0 [args] -- command1 [args] -- ...
> >
> > Then, the number of ovs-vsctl will be reduced and ovs-vsctl issues
> > only single ovsdb transaction.
> > --
> > Isaku Yamahata 
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >

-- 
Isaku Yamahata 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Joe Gordon
On Mon, Jan 6, 2014 at 8:38 AM, Jay Pipes  wrote:

> On Mon, 2014-01-06 at 17:04 +0100, Salvatore Orlando wrote:
> > I have already discussed the matter with Jay on IRC, even if for a
> > different issue.
> > In this specific case 'batching' will have the benefit of reducing the
> > rootwrap overhead.
>
> Right.
>
> > However, it seems the benefit from batching is not resolutive. I admit
> > I have not run tests in the gate with batching; I've just tested in an
> > environment without significant load, obtaining a performance increase
> > of less than 10%.
>
> Well, 10% is 10% better than nothing ;) And add in the (significant)
> rootwrap costs, and I think it's certainly worth looking into.
>


Have you tried running neutron without rootwrap, to get a baseline?




>
> > From what I gathered even if commands are 'batched' to ovs-vsctl,
> > operations are still individually performed on the kernel module. I
> > did not investigate whether the cli commands sends a single or
> > multiple commands on the ovsdb interface.
> > Nevertheless, another thing to note is that it's not just ovs-vsctl
> > that becomes very slow, but also, and more often than that, ovs-ofctl,
> > for which there is no batching.
>
> Ah, I did not realize ovs-ofctl had no batch mode. That's a shame...
>
> > Summarising, I'm not opposed to batching for ovs-vsctl, and I would
> > definitely welcome it; I just don't think it will be the ultimate
> > solution.
>
> Yep, understood.
>
> Thanks!
> -jay
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 17:04 +0100, Salvatore Orlando wrote:
> I have already discussed the matter with Jay on IRC, even if for a
> different issue.
> In this specific case 'batching' will have the benefit of reducing the
> rootwrap overhead.

Right.

> However, it seems the benefit from batching is not resolutive. I admit
> I have not run tests in the gate with batching; I've just tested in an
> environment without significant load, obtaining a performance increase
> of less than 10%.

Well, 10% is 10% better than nothing ;) And add in the (significant)
rootwrap costs, and I think it's certainly worth looking into.

> From what I gathered even if commands are 'batched' to ovs-vsctl,
> operations are still individually performed on the kernel module. I
> did not investigate whether the cli commands sends a single or
> multiple commands on the ovsdb interface.
> Nevertheless, another thing to note is that it's not just ovs-vsctl
> that becomes very slow, but also, and more often than that, ovs-ofctl,
> for which there is no batching.

Ah, I did not realize ovs-ofctl had no batch mode. That's a shame...

> Summarising, I'm not opposed to batching for ovs-vsctl, and I would
> definitely welcome it; I just don't think it will be the ultimate
> solution.

Yep, understood.

Thanks!
-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Jay Pipes
On Mon, 2014-01-06 at 19:40 +0900, Isaku Yamahata wrote:
> On Fri, Dec 27, 2013 at 11:09:02AM +0100,
> Salvatore Orlando  wrote:
> 
> > Hi,
> > 
> > We now have several patches under review which improve a lot how neutron
> > handles parallel testing.
> > In a nutshell, these patches try to ensure the ovs agent processes new,
> > removed, and updated interfaces as soon as possible,
> > 
> > These patches are:
> > https://review.openstack.org/#/c/61105/
> > https://review.openstack.org/#/c/61964/
> > https://review.openstack.org/#/c/63100/
> > https://review.openstack.org/#/c/63558/
> > 
> > There is still room for improvement. For instance the calls from the agent
> > into the plugins might be consistently reduced.
> > However, even if the above patches shrink a lot the time required for
> > processing a device, we are still hitting a hard limit with the execution
> > ovs commands for setting local vlan tags and clearing flows (or adding the
> > flow rule for dropping all the traffic).
> > In some instances this commands slow down a lot, requiring almost 10
> > seconds to complete. This adds a delay in interface processing which in
> > some cases leads to the hideous SSH timeout error (the same we see with bug
> > 1253896 in normal testing).
> > It is also worth noting that when this happens sysstat reveal CPU usage is
> > very close to 100%
> > 
> > From the neutron side there is little we can do. Introducing parallel
> > processing for interface, as we do for the l3 agent, is not actually a
> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not
> > multithreaded. If you think the situation might be improved by changing the
> > logic for handling local vlan tags and putting ports on the dead vlan, I
> > would be happy to talk about that.
> 
> How about batching those ovsdb operations?
> Instead of issueing many ovs-vsctl command,
> ovs-vsctl -- command0 [args] -- command1 [args] -- ...
> 
> Then, the number of ovs-vsctl will be reduced and ovs-vsctl issues
> only single ovsdb transaction.

https://bugs.launchpad.net/neutron/+bug/1264608

:)

-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-06 Thread Isaku Yamahata
On Fri, Dec 27, 2013 at 11:09:02AM +0100,
Salvatore Orlando  wrote:

> Hi,
> 
> We now have several patches under review which improve a lot how neutron
> handles parallel testing.
> In a nutshell, these patches try to ensure the ovs agent processes new,
> removed, and updated interfaces as soon as possible,
> 
> These patches are:
> https://review.openstack.org/#/c/61105/
> https://review.openstack.org/#/c/61964/
> https://review.openstack.org/#/c/63100/
> https://review.openstack.org/#/c/63558/
> 
> There is still room for improvement. For instance the calls from the agent
> into the plugins might be consistently reduced.
> However, even if the above patches shrink a lot the time required for
> processing a device, we are still hitting a hard limit with the execution
> ovs commands for setting local vlan tags and clearing flows (or adding the
> flow rule for dropping all the traffic).
> In some instances this commands slow down a lot, requiring almost 10
> seconds to complete. This adds a delay in interface processing which in
> some cases leads to the hideous SSH timeout error (the same we see with bug
> 1253896 in normal testing).
> It is also worth noting that when this happens sysstat reveal CPU usage is
> very close to 100%
> 
> From the neutron side there is little we can do. Introducing parallel
> processing for interface, as we do for the l3 agent, is not actually a
> solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not
> multithreaded. If you think the situation might be improved by changing the
> logic for handling local vlan tags and putting ports on the dead vlan, I
> would be happy to talk about that.

How about batching those ovsdb operations?
Instead of issueing many ovs-vsctl command,
ovs-vsctl -- command0 [args] -- command1 [args] -- ...

Then, the number of ovs-vsctl will be reduced and ovs-vsctl issues
only single ovsdb transaction.
-- 
Isaku Yamahata 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-03 Thread Salvatore Orlando
I have already a patch under review for the quota test, for which I adopted
the shortest-diff approach.
As regards Robert's suggestion, the problem we have there is that the test
uses a dedicated tenant, but it does not take into account the possibility
that at some point the dhcp agent will create a port too for that tenant.

In theory I tend to agree with Miguel; but I'm not sure what would be the
consensus on removing a scenario test. I think we either decide to merge
this shortest-diff patch [1] once the comments are addressed, or re-design
the tests, which might take some more time.

Salvatore

PS: shortest-diff is, as you might have already understood, an euphemism
for 'hack'


[1] https://review.openstack.org/#/c/64217/



On 2 January 2014 22:39, Robert Collins  wrote:

> Another way to tackle it would be to create a dedicated tenant for
> those tests, then the quota won't interact with anything else.
>
> On 3 January 2014 10:35, Miguel Angel Ajo Pelayo 
> wrote:
> > Hi Salvatore!,
> >
> >Good work on this.
> >
> >About the quota limit tests, I believe they may be unit-tested,
> > instead of functionally tested.
> >
> >When running those tests in parallel with any other tests that rely
> > on having ports, networks or subnets available into quota, they have
> > high chances of making those other tests fail.
> >
> > Cheers,
> > Miguel Ángel Ajo
> >
> >
> >
> > - Original Message -
> >> From: "Kyle Mestery" 
> >> To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev@lists.openstack.org>
> >> Sent: Thursday, January 2, 2014 7:53:05 PM
> >> Subject: Re: [openstack-dev] [Neutron][qa] Parallel testing update
> >>
> >> Thanks for the updates here Salvatore, and for continuing to push on
> >> this! This is all great work!
> >>
> >> On Jan 2, 2014, at 6:57 AM, Salvatore Orlando 
> wrote:
> >> >
> >> > Hi again,
> >> >
> >> > I've now run the experimental job a good deal of times, and I've
> filed bugs
> >> > for all the issues which came out.
> >> > Most of them occurred no more than once among all test execution (I
> think
> >> > about 30).
> >> >
> >> > They're all tagged with neutron-parallel [1]. for ease of tracking,
> I've
> >> > associated all the bug reports with neutron, but some are probably
> more
> >> > tempest or nova issues.
> >> >
> >> > Salvatore
> >> >
> >> > [1]
> https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
> >> >
> >> >
> >> > On 27 December 2013 11:09, Salvatore Orlando 
> wrote:
> >> > Hi,
> >> >
> >> > We now have several patches under review which improve a lot how
> neutron
> >> > handles parallel testing.
> >> > In a nutshell, these patches try to ensure the ovs agent processes
> new,
> >> > removed, and updated interfaces as soon as possible,
> >> >
> >> > These patches are:
> >> > https://review.openstack.org/#/c/61105/
> >> > https://review.openstack.org/#/c/61964/
> >> > https://review.openstack.org/#/c/63100/
> >> > https://review.openstack.org/#/c/63558/
> >> >
> >> > There is still room for improvement. For instance the calls from the
> agent
> >> > into the plugins might be consistently reduced.
> >> > However, even if the above patches shrink a lot the time required for
> >> > processing a device, we are still hitting a hard limit with the
> execution
> >> > ovs commands for setting local vlan tags and clearing flows (or
> adding the
> >> > flow rule for dropping all the traffic).
> >> > In some instances this commands slow down a lot, requiring almost 10
> >> > seconds to complete. This adds a delay in interface processing which
> in
> >> > some cases leads to the hideous SSH timeout error (the same we see
> with
> >> > bug 1253896 in normal testing).
> >> > It is also worth noting that when this happens sysstat reveal CPU
> usage is
> >> > very close to 100%
> >> >
> >> > From the neutron side there is little we can do. Introducing parallel
> >> > processing for interface, as we do for the l3 agent, is not actually a
> >> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests,
> is
> >> > not multithread

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-02 Thread Robert Collins
Another way to tackle it would be to create a dedicated tenant for
those tests, then the quota won't interact with anything else.

On 3 January 2014 10:35, Miguel Angel Ajo Pelayo  wrote:
> Hi Salvatore!,
>
>Good work on this.
>
>About the quota limit tests, I believe they may be unit-tested,
> instead of functionally tested.
>
>When running those tests in parallel with any other tests that rely
> on having ports, networks or subnets available into quota, they have
> high chances of making those other tests fail.
>
> Cheers,
> Miguel Ángel Ajo
>
>
>
> - Original Message -
>> From: "Kyle Mestery" 
>> To: "OpenStack Development Mailing List (not for usage questions)" 
>> 
>> Sent: Thursday, January 2, 2014 7:53:05 PM
>> Subject: Re: [openstack-dev] [Neutron][qa] Parallel testing update
>>
>> Thanks for the updates here Salvatore, and for continuing to push on
>> this! This is all great work!
>>
>> On Jan 2, 2014, at 6:57 AM, Salvatore Orlando  wrote:
>> >
>> > Hi again,
>> >
>> > I've now run the experimental job a good deal of times, and I've filed bugs
>> > for all the issues which came out.
>> > Most of them occurred no more than once among all test execution (I think
>> > about 30).
>> >
>> > They're all tagged with neutron-parallel [1]. for ease of tracking, I've
>> > associated all the bug reports with neutron, but some are probably more
>> > tempest or nova issues.
>> >
>> > Salvatore
>> >
>> > [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
>> >
>> >
>> > On 27 December 2013 11:09, Salvatore Orlando  wrote:
>> > Hi,
>> >
>> > We now have several patches under review which improve a lot how neutron
>> > handles parallel testing.
>> > In a nutshell, these patches try to ensure the ovs agent processes new,
>> > removed, and updated interfaces as soon as possible,
>> >
>> > These patches are:
>> > https://review.openstack.org/#/c/61105/
>> > https://review.openstack.org/#/c/61964/
>> > https://review.openstack.org/#/c/63100/
>> > https://review.openstack.org/#/c/63558/
>> >
>> > There is still room for improvement. For instance the calls from the agent
>> > into the plugins might be consistently reduced.
>> > However, even if the above patches shrink a lot the time required for
>> > processing a device, we are still hitting a hard limit with the execution
>> > ovs commands for setting local vlan tags and clearing flows (or adding the
>> > flow rule for dropping all the traffic).
>> > In some instances this commands slow down a lot, requiring almost 10
>> > seconds to complete. This adds a delay in interface processing which in
>> > some cases leads to the hideous SSH timeout error (the same we see with
>> > bug 1253896 in normal testing).
>> > It is also worth noting that when this happens sysstat reveal CPU usage is
>> > very close to 100%
>> >
>> > From the neutron side there is little we can do. Introducing parallel
>> > processing for interface, as we do for the l3 agent, is not actually a
>> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is
>> > not multithreaded. If you think the situation might be improved by
>> > changing the logic for handling local vlan tags and putting ports on the
>> > dead vlan, I would be happy to talk about that.
>> > On my local machines I've seen a dramatic improvement in processing times
>> > by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
>> > something we might consider for gate tests? Also, in order to reduce CPU
>> > usage on the gate (and making tests a bit faster), there is a tempest
>> > patch which stops creating and wiring neutron routers when they're not
>> > needed: https://review.openstack.org/#/c/62962/
>> >
>> > Even in my local setup which succeeds about 85% of times, I'm still seeing
>> > some occurrences of the issue described in [1], which at the end of the
>> > day seems a dnsmasq issue.
>> >
>> > Beyond the 'big' structural problem discussed above, there are some minor
>> > problems with a few tests:
>> >
>> > 1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90%
>> > of times. I think this is because the test itself should be made aware of
>> > parallel execution and asy

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-02 Thread Miguel Angel Ajo Pelayo
Hi Salvatore!, 

   Good work on this.

   About the quota limit tests, I believe they may be unit-tested, 
instead of functionally tested.

   When running those tests in parallel with any other tests that rely
on having ports, networks or subnets available into quota, they have
high chances of making those other tests fail.

Cheers,
Miguel Ángel Ajo



- Original Message -
> From: "Kyle Mestery" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, January 2, 2014 7:53:05 PM
> Subject: Re: [openstack-dev] [Neutron][qa] Parallel testing update
> 
> Thanks for the updates here Salvatore, and for continuing to push on
> this! This is all great work!
> 
> On Jan 2, 2014, at 6:57 AM, Salvatore Orlando  wrote:
> > 
> > Hi again,
> > 
> > I've now run the experimental job a good deal of times, and I've filed bugs
> > for all the issues which came out.
> > Most of them occurred no more than once among all test execution (I think
> > about 30).
> > 
> > They're all tagged with neutron-parallel [1]. for ease of tracking, I've
> > associated all the bug reports with neutron, but some are probably more
> > tempest or nova issues.
> > 
> > Salvatore
> > 
> > [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
> > 
> > 
> > On 27 December 2013 11:09, Salvatore Orlando  wrote:
> > Hi,
> > 
> > We now have several patches under review which improve a lot how neutron
> > handles parallel testing.
> > In a nutshell, these patches try to ensure the ovs agent processes new,
> > removed, and updated interfaces as soon as possible,
> > 
> > These patches are:
> > https://review.openstack.org/#/c/61105/
> > https://review.openstack.org/#/c/61964/
> > https://review.openstack.org/#/c/63100/
> > https://review.openstack.org/#/c/63558/
> > 
> > There is still room for improvement. For instance the calls from the agent
> > into the plugins might be consistently reduced.
> > However, even if the above patches shrink a lot the time required for
> > processing a device, we are still hitting a hard limit with the execution
> > ovs commands for setting local vlan tags and clearing flows (or adding the
> > flow rule for dropping all the traffic).
> > In some instances this commands slow down a lot, requiring almost 10
> > seconds to complete. This adds a delay in interface processing which in
> > some cases leads to the hideous SSH timeout error (the same we see with
> > bug 1253896 in normal testing).
> > It is also worth noting that when this happens sysstat reveal CPU usage is
> > very close to 100%
> > 
> > From the neutron side there is little we can do. Introducing parallel
> > processing for interface, as we do for the l3 agent, is not actually a
> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is
> > not multithreaded. If you think the situation might be improved by
> > changing the logic for handling local vlan tags and putting ports on the
> > dead vlan, I would be happy to talk about that.
> > On my local machines I've seen a dramatic improvement in processing times
> > by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
> > something we might consider for gate tests? Also, in order to reduce CPU
> > usage on the gate (and making tests a bit faster), there is a tempest
> > patch which stops creating and wiring neutron routers when they're not
> > needed: https://review.openstack.org/#/c/62962/
> > 
> > Even in my local setup which succeeds about 85% of times, I'm still seeing
> > some occurrences of the issue described in [1], which at the end of the
> > day seems a dnsmasq issue.
> > 
> > Beyond the 'big' structural problem discussed above, there are some minor
> > problems with a few tests:
> > 
> > 1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90%
> > of times. I think this is because the test itself should be made aware of
> > parallel execution and asynchronous events, and there is a patch for this
> > already: https://review.openstack.org/#/c/64217
> > 
> > 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails
> > about 66% of times. The failure is always on an assertion made after
> > deletion of interfaces, which probably means the interface is not deleted
> > within 5 seconds. I think this might be a consequence of the higher load
> > on the neutron service and we might try to enable multiple workers on the
> > gate to this aim, or 

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-02 Thread Kyle Mestery
Thanks for the updates here Salvatore, and for continuing to push on
this! This is all great work!

On Jan 2, 2014, at 6:57 AM, Salvatore Orlando  wrote:
> 
> Hi again,
> 
> I've now run the experimental job a good deal of times, and I've filed bugs 
> for all the issues which came out.
> Most of them occurred no more than once among all test execution (I think 
> about 30).
> 
> They're all tagged with neutron-parallel [1]. for ease of tracking, I've 
> associated all the bug reports with neutron, but some are probably more 
> tempest or nova issues.
> 
> Salvatore
> 
> [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
> 
> 
> On 27 December 2013 11:09, Salvatore Orlando  wrote:
> Hi,
> 
> We now have several patches under review which improve a lot how neutron 
> handles parallel testing.
> In a nutshell, these patches try to ensure the ovs agent processes new, 
> removed, and updated interfaces as soon as possible,
> 
> These patches are:
> https://review.openstack.org/#/c/61105/
> https://review.openstack.org/#/c/61964/
> https://review.openstack.org/#/c/63100/
> https://review.openstack.org/#/c/63558/
> 
> There is still room for improvement. For instance the calls from the agent 
> into the plugins might be consistently reduced.
> However, even if the above patches shrink a lot the time required for 
> processing a device, we are still hitting a hard limit with the execution ovs 
> commands for setting local vlan tags and clearing flows (or adding the flow 
> rule for dropping all the traffic).
> In some instances this commands slow down a lot, requiring almost 10 seconds 
> to complete. This adds a delay in interface processing which in some cases 
> leads to the hideous SSH timeout error (the same we see with bug 1253896 in 
> normal testing).
> It is also worth noting that when this happens sysstat reveal CPU usage is 
> very close to 100%
> 
> From the neutron side there is little we can do. Introducing parallel 
> processing for interface, as we do for the l3 agent, is not actually a 
> solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not 
> multithreaded. If you think the situation might be improved by changing the 
> logic for handling local vlan tags and putting ports on the dead vlan, I 
> would be happy to talk about that.
> On my local machines I've seen a dramatic improvement in processing times by 
> installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this something 
> we might consider for gate tests? Also, in order to reduce CPU usage on the 
> gate (and making tests a bit faster), there is a tempest patch which stops 
> creating and wiring neutron routers when they're not needed: 
> https://review.openstack.org/#/c/62962/
> 
> Even in my local setup which succeeds about 85% of times, I'm still seeing 
> some occurrences of the issue described in [1], which at the end of the day 
> seems a dnsmasq issue.
> 
> Beyond the 'big' structural problem discussed above, there are some minor 
> problems with a few tests:
> 
> 1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90% of 
> times. I think this is because the test itself should be made aware of 
> parallel execution and asynchronous events, and there is a patch for this 
> already: https://review.openstack.org/#/c/64217
> 
> 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails about 
> 66% of times. The failure is always on an assertion made after deletion of 
> interfaces, which probably means the interface is not deleted within 5 
> seconds. I think this might be a consequence of the higher load on the 
> neutron service and we might try to enable multiple workers on the gate to 
> this aim, or just increase the tempest timeout. On a slightly different note, 
> allow me to say that the way assertion are made on this test might be 
> improved a bit. So far one has to go through the code to see why the test 
> failed.
> 
> Thanks for reading this rather long message.
> Regards,
> Salvatore
> 
> [1] https://lists.launchpad.net/openstack/msg23817.html
> 
> 
> 
> 
> On 2 December 2013 22:01, Kyle Mestery (kmestery)  wrote:
> Yes, this is all great Salvatore and Armando! Thank you for all of this work
> and the explanation behind it all.
> 
> Kyle
> 
> On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov  wrote:
> 
> > Salvatore and Armando, thanks for your great work and detailed explanation!
> >
> > Eugene.
> >
> >
> > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon  wrote:
> >
> > On Dec 2, 2013 9:04 PM, "Salvatore Orlando"  wrote:
> > >
> > > Hi,
> > >
> > > As you might have noticed, there has been some progress on parallel tests 
> > > for neutron.
> > > In a nutshell:
> > > * Armando fixed the issue with IP address exhaustion on the public 
> > > network [1]
> > > * Salvatore has now a patch which has a 50% success rate (the last 
> > > failures are because of me playing with it) [2]
> > > * Salvatore is looking at putting back on track full isolation 

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2014-01-02 Thread Salvatore Orlando
Hi again,

I've now run the experimental job a good deal of times, and I've filed bugs
for all the issues which came out.
Most of them occurred no more than once among all test execution (I think
about 30).

They're all tagged with neutron-parallel [1]. for ease of tracking, I've
associated all the bug reports with neutron, but some are probably more
tempest or nova issues.

Salvatore

[1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel


On 27 December 2013 11:09, Salvatore Orlando  wrote:

> Hi,
>
> We now have several patches under review which improve a lot how neutron
> handles parallel testing.
> In a nutshell, these patches try to ensure the ovs agent processes new,
> removed, and updated interfaces as soon as possible,
>
> These patches are:
> https://review.openstack.org/#/c/61105/
> https://review.openstack.org/#/c/61964/
> https://review.openstack.org/#/c/63100/
> https://review.openstack.org/#/c/63558/
>
> There is still room for improvement. For instance the calls from the agent
> into the plugins might be consistently reduced.
> However, even if the above patches shrink a lot the time required for
> processing a device, we are still hitting a hard limit with the execution
> ovs commands for setting local vlan tags and clearing flows (or adding the
> flow rule for dropping all the traffic).
> In some instances this commands slow down a lot, requiring almost 10
> seconds to complete. This adds a delay in interface processing which in
> some cases leads to the hideous SSH timeout error (the same we see with bug
> 1253896 in normal testing).
> It is also worth noting that when this happens sysstat reveal CPU usage is
> very close to 100%
>
> From the neutron side there is little we can do. Introducing parallel
> processing for interface, as we do for the l3 agent, is not actually a
> solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not
> multithreaded. If you think the situation might be improved by changing the
> logic for handling local vlan tags and putting ports on the dead vlan, I
> would be happy to talk about that.
> On my local machines I've seen a dramatic improvement in processing times
> by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
> something we might consider for gate tests? Also, in order to reduce CPU
> usage on the gate (and making tests a bit faster), there is a tempest patch
> which stops creating and wiring neutron routers when they're not needed:
> https://review.openstack.org/#/c/62962/
>
> Even in my local setup which succeeds about 85% of times, I'm still seeing
> some occurrences of the issue described in [1], which at the end of the day
> seems a dnsmasq issue.
>
> Beyond the 'big' structural problem discussed above, there are some minor
> problems with a few tests:
>
> 1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90%
> of times. I think this is because the test itself should be made aware of
> parallel execution and asynchronous events, and there is a patch for this
> already: https://review.openstack.org/#/c/64217
>
> 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails
> about 66% of times. The failure is always on an assertion made after
> deletion of interfaces, which probably means the interface is not deleted
> within 5 seconds. I think this might be a consequence of the higher load on
> the neutron service and we might try to enable multiple workers on the gate
> to this aim, or just increase the tempest timeout. On a slightly different
> note, allow me to say that the way assertion are made on this test might be
> improved a bit. So far one has to go through the code to see why the test
> failed.
>
> Thanks for reading this rather long message.
> Regards,
> Salvatore
>
> [1] https://lists.launchpad.net/openstack/msg23817.html
>
>
>
>
> On 2 December 2013 22:01, Kyle Mestery (kmestery) wrote:
>
>> Yes, this is all great Salvatore and Armando! Thank you for all of this
>> work
>> and the explanation behind it all.
>>
>> Kyle
>>
>> On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov 
>> wrote:
>>
>> > Salvatore and Armando, thanks for your great work and detailed
>> explanation!
>> >
>> > Eugene.
>> >
>> >
>> > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon 
>> wrote:
>> >
>> > On Dec 2, 2013 9:04 PM, "Salvatore Orlando" 
>> wrote:
>> > >
>> > > Hi,
>> > >
>> > > As you might have noticed, there has been some progress on parallel
>> tests for neutron.
>> > > In a nutshell:
>> > > * Armando fixed the issue with IP address exhaustion on the public
>> network [1]
>> > > * Salvatore has now a patch which has a 50% success rate (the last
>> failures are because of me playing with it) [2]
>> > > * Salvatore is looking at putting back on track full isolation [3]
>> > > * All the bugs affecting parallel tests can be queried here [10]
>> > > * This blueprint tracks progress made towards enabling parallel
>> testing [11]
>> > >
>> > > -
>> > > The long story is as fol

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2013-12-27 Thread Salvatore Orlando
Hi,

We now have several patches under review which improve a lot how neutron
handles parallel testing.
In a nutshell, these patches try to ensure the ovs agent processes new,
removed, and updated interfaces as soon as possible,

These patches are:
https://review.openstack.org/#/c/61105/
https://review.openstack.org/#/c/61964/
https://review.openstack.org/#/c/63100/
https://review.openstack.org/#/c/63558/

There is still room for improvement. For instance the calls from the agent
into the plugins might be consistently reduced.
However, even if the above patches shrink a lot the time required for
processing a device, we are still hitting a hard limit with the execution
ovs commands for setting local vlan tags and clearing flows (or adding the
flow rule for dropping all the traffic).
In some instances this commands slow down a lot, requiring almost 10
seconds to complete. This adds a delay in interface processing which in
some cases leads to the hideous SSH timeout error (the same we see with bug
1253896 in normal testing).
It is also worth noting that when this happens sysstat reveal CPU usage is
very close to 100%

>From the neutron side there is little we can do. Introducing parallel
processing for interface, as we do for the l3 agent, is not actually a
solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is not
multithreaded. If you think the situation might be improved by changing the
logic for handling local vlan tags and putting ports on the dead vlan, I
would be happy to talk about that.
On my local machines I've seen a dramatic improvement in processing times
by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
something we might consider for gate tests? Also, in order to reduce CPU
usage on the gate (and making tests a bit faster), there is a tempest patch
which stops creating and wiring neutron routers when they're not needed:
https://review.openstack.org/#/c/62962/

Even in my local setup which succeeds about 85% of times, I'm still seeing
some occurrences of the issue described in [1], which at the end of the day
seems a dnsmasq issue.

Beyond the 'big' structural problem discussed above, there are some minor
problems with a few tests:

1) test_network_quotas.test_create_ports_until_quota_hit  fails about 90%
of times. I think this is because the test itself should be made aware of
parallel execution and asynchronous events, and there is a patch for this
already: https://review.openstack.org/#/c/64217

2) test_attach_interfaces.test_create_list_show_delete_interfaces fails
about 66% of times. The failure is always on an assertion made after
deletion of interfaces, which probably means the interface is not deleted
within 5 seconds. I think this might be a consequence of the higher load on
the neutron service and we might try to enable multiple workers on the gate
to this aim, or just increase the tempest timeout. On a slightly different
note, allow me to say that the way assertion are made on this test might be
improved a bit. So far one has to go through the code to see why the test
failed.

Thanks for reading this rather long message.
Regards,
Salvatore

[1] https://lists.launchpad.net/openstack/msg23817.html




On 2 December 2013 22:01, Kyle Mestery (kmestery) wrote:

> Yes, this is all great Salvatore and Armando! Thank you for all of this
> work
> and the explanation behind it all.
>
> Kyle
>
> On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov 
> wrote:
>
> > Salvatore and Armando, thanks for your great work and detailed
> explanation!
> >
> > Eugene.
> >
> >
> > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon 
> wrote:
> >
> > On Dec 2, 2013 9:04 PM, "Salvatore Orlando"  wrote:
> > >
> > > Hi,
> > >
> > > As you might have noticed, there has been some progress on parallel
> tests for neutron.
> > > In a nutshell:
> > > * Armando fixed the issue with IP address exhaustion on the public
> network [1]
> > > * Salvatore has now a patch which has a 50% success rate (the last
> failures are because of me playing with it) [2]
> > > * Salvatore is looking at putting back on track full isolation [3]
> > > * All the bugs affecting parallel tests can be queried here [10]
> > > * This blueprint tracks progress made towards enabling parallel
> testing [11]
> > >
> > > -
> > > The long story is as follows:
> > > Parallel testing basically is not working because parallelism means
> higher contention for public IP addresses. This was made worse by the fact
> that some tests created a router with a gateway set but never deleted it.
> As a result, there were even less addresses in the public range.
> > > [1] was already merged and with [4] we shall make the public network
> for neutron a /24 (the full tempest suite is still showing a lot of IP
> exhaustion errors).
> > >
> > > However, this was just one part of the issue. The biggest part
> actually lied with the OVS agent and its interactions with the ML2 plugin.
> A few patches ([5], [6], [7]) were already pushed to reduce 

Re: [openstack-dev] [Neutron][qa] Parallel testing update

2013-12-02 Thread Kyle Mestery (kmestery)
Yes, this is all great Salvatore and Armando! Thank you for all of this work
and the explanation behind it all.

Kyle

On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov  wrote:

> Salvatore and Armando, thanks for your great work and detailed explanation!
> 
> Eugene.
> 
> 
> On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon  wrote:
> 
> On Dec 2, 2013 9:04 PM, "Salvatore Orlando"  wrote:
> >
> > Hi,
> >
> > As you might have noticed, there has been some progress on parallel tests 
> > for neutron.
> > In a nutshell:
> > * Armando fixed the issue with IP address exhaustion on the public network 
> > [1]
> > * Salvatore has now a patch which has a 50% success rate (the last failures 
> > are because of me playing with it) [2]
> > * Salvatore is looking at putting back on track full isolation [3]
> > * All the bugs affecting parallel tests can be queried here [10]
> > * This blueprint tracks progress made towards enabling parallel testing [11]
> >
> > -
> > The long story is as follows:
> > Parallel testing basically is not working because parallelism means higher 
> > contention for public IP addresses. This was made worse by the fact that 
> > some tests created a router with a gateway set but never deleted it. As a 
> > result, there were even less addresses in the public range.
> > [1] was already merged and with [4] we shall make the public network for 
> > neutron a /24 (the full tempest suite is still showing a lot of IP 
> > exhaustion errors).
> >
> > However, this was just one part of the issue. The biggest part actually 
> > lied with the OVS agent and its interactions with the ML2 plugin. A few 
> > patches ([5], [6], [7]) were already pushed to reduce the number of 
> > notifications sent from the plugin to the agent. However, the agent is 
> > organised in a way such that a notification is immediately acted upon thus 
> > preempting the main agent loop, which is the one responsible for wiring 
> > ports into networks. Considering the high level of notifications currently 
> > sent from the server, this becomes particularly wasteful if one consider 
> > that security membership updates for ports trigger global 
> > iptables-save/restore commands which are often executed in rapid 
> > succession, thus resulting in long delays for wiring VIFs to the 
> > appropriate network.
> > With the patch [2] we are refactoring the agent to make it more efficient. 
> > This is not production code, but once we'll get close to 100% pass for 
> > parallel testing this patch will be split in several patches, properly 
> > structured, and hopefully easy to review.
> > It is worth noting there is still work to do: in some cases the loop still 
> > takes too long, and it has been observed ovs commands taking even 10 
> > seconds to complete. To this aim, it is worth considering use of async 
> > processes introduced in [8] as well as leveraging ovsdb monitoring [9] for 
> > limiting queries to ovs database.
> > We're still unable to explain some failures where the network appears to be 
> > correctly wired (floating IP, router port, dhcp port, and VIF port), but 
> > the SSH connection fails. We're hoping to reproduce this failure patter 
> > locally.
> >
> > Finally, the tempest patch for full tempest isolation should be made usable 
> > soon. Having another experimental job for it is something worth considering 
> > as for some reason it is not always easy reproducing the same failure modes 
> > exhibited on the gate.
> >
> > Regards,
> > Salvatore
> >
> 
> Awesome work, thanks for the update.
> 
> 
> > [1] https://review.openstack.org/#/c/58054/
> > [2] https://review.openstack.org/#/c/57420/
> > [3] https://review.openstack.org/#/c/53459/
> > [4] https://review.openstack.org/#/c/58284/
> > [5] https://review.openstack.org/#/c/58860/
> > [6] https://review.openstack.org/#/c/58597/
> > [7] https://review.openstack.org/#/c/58415/
> > [8] https://review.openstack.org/#/c/45676/
> > [9] https://bugs.launchpad.net/neutron/+bug/1177973
> > [10] 
> > https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel&field.tags_combinator=ANY
> > [11] https://blueprints.launchpad.net/neutron/+spec/neutron-tempest-parallel
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2013-12-02 Thread Eugene Nikanorov
Salvatore and Armando, thanks for your great work and detailed explanation!

Eugene.


On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon  wrote:

>
> On Dec 2, 2013 9:04 PM, "Salvatore Orlando"  wrote:
> >
> > Hi,
> >
> > As you might have noticed, there has been some progress on parallel
> tests for neutron.
> > In a nutshell:
> > * Armando fixed the issue with IP address exhaustion on the public
> network [1]
> > * Salvatore has now a patch which has a 50% success rate (the last
> failures are because of me playing with it) [2]
> > * Salvatore is looking at putting back on track full isolation [3]
> > * All the bugs affecting parallel tests can be queried here [10]
> > * This blueprint tracks progress made towards enabling parallel testing
> [11]
> >
> > -
> > The long story is as follows:
> > Parallel testing basically is not working because parallelism means
> higher contention for public IP addresses. This was made worse by the fact
> that some tests created a router with a gateway set but never deleted it.
> As a result, there were even less addresses in the public range.
> > [1] was already merged and with [4] we shall make the public network for
> neutron a /24 (the full tempest suite is still showing a lot of IP
> exhaustion errors).
> >
> > However, this was just one part of the issue. The biggest part actually
> lied with the OVS agent and its interactions with the ML2 plugin. A few
> patches ([5], [6], [7]) were already pushed to reduce the number of
> notifications sent from the plugin to the agent. However, the agent is
> organised in a way such that a notification is immediately acted upon thus
> preempting the main agent loop, which is the one responsible for wiring
> ports into networks. Considering the high level of notifications currently
> sent from the server, this becomes particularly wasteful if one consider
> that security membership updates for ports trigger global
> iptables-save/restore commands which are often executed in rapid
> succession, thus resulting in long delays for wiring VIFs to the
> appropriate network.
> > With the patch [2] we are refactoring the agent to make it more
> efficient. This is not production code, but once we'll get close to 100%
> pass for parallel testing this patch will be split in several patches,
> properly structured, and hopefully easy to review.
> > It is worth noting there is still work to do: in some cases the loop
> still takes too long, and it has been observed ovs commands taking even 10
> seconds to complete. To this aim, it is worth considering use of async
> processes introduced in [8] as well as leveraging ovsdb monitoring [9] for
> limiting queries to ovs database.
> > We're still unable to explain some failures where the network appears to
> be correctly wired (floating IP, router port, dhcp port, and VIF port), but
> the SSH connection fails. We're hoping to reproduce this failure patter
> locally.
> >
> > Finally, the tempest patch for full tempest isolation should be made
> usable soon. Having another experimental job for it is something worth
> considering as for some reason it is not always easy reproducing the same
> failure modes exhibited on the gate.
> >
> > Regards,
> > Salvatore
> >
>
> Awesome work, thanks for the update.
>
> > [1] https://review.openstack.org/#/c/58054/
> > [2] https://review.openstack.org/#/c/57420/
> > [3] https://review.openstack.org/#/c/53459/
> > [4] https://review.openstack.org/#/c/58284/
> > [5] https://review.openstack.org/#/c/58860/
> > [6] https://review.openstack.org/#/c/58597/
> > [7] https://review.openstack.org/#/c/58415/
> > [8] https://review.openstack.org/#/c/45676/
> > [9] https://bugs.launchpad.net/neutron/+bug/1177973
> > [10]
> https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel&field.tags_combinator=ANY
> > [11]
> https://blueprints.launchpad.net/neutron/+spec/neutron-tempest-parallel
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][qa] Parallel testing update

2013-12-02 Thread Joe Gordon
On Dec 2, 2013 9:04 PM, "Salvatore Orlando"  wrote:
>
> Hi,
>
> As you might have noticed, there has been some progress on parallel tests
for neutron.
> In a nutshell:
> * Armando fixed the issue with IP address exhaustion on the public
network [1]
> * Salvatore has now a patch which has a 50% success rate (the last
failures are because of me playing with it) [2]
> * Salvatore is looking at putting back on track full isolation [3]
> * All the bugs affecting parallel tests can be queried here [10]
> * This blueprint tracks progress made towards enabling parallel testing
[11]
>
> -
> The long story is as follows:
> Parallel testing basically is not working because parallelism means
higher contention for public IP addresses. This was made worse by the fact
that some tests created a router with a gateway set but never deleted it.
As a result, there were even less addresses in the public range.
> [1] was already merged and with [4] we shall make the public network for
neutron a /24 (the full tempest suite is still showing a lot of IP
exhaustion errors).
>
> However, this was just one part of the issue. The biggest part actually
lied with the OVS agent and its interactions with the ML2 plugin. A few
patches ([5], [6], [7]) were already pushed to reduce the number of
notifications sent from the plugin to the agent. However, the agent is
organised in a way such that a notification is immediately acted upon thus
preempting the main agent loop, which is the one responsible for wiring
ports into networks. Considering the high level of notifications currently
sent from the server, this becomes particularly wasteful if one consider
that security membership updates for ports trigger global
iptables-save/restore commands which are often executed in rapid
succession, thus resulting in long delays for wiring VIFs to the
appropriate network.
> With the patch [2] we are refactoring the agent to make it more
efficient. This is not production code, but once we'll get close to 100%
pass for parallel testing this patch will be split in several patches,
properly structured, and hopefully easy to review.
> It is worth noting there is still work to do: in some cases the loop
still takes too long, and it has been observed ovs commands taking even 10
seconds to complete. To this aim, it is worth considering use of async
processes introduced in [8] as well as leveraging ovsdb monitoring [9] for
limiting queries to ovs database.
> We're still unable to explain some failures where the network appears to
be correctly wired (floating IP, router port, dhcp port, and VIF port), but
the SSH connection fails. We're hoping to reproduce this failure patter
locally.
>
> Finally, the tempest patch for full tempest isolation should be made
usable soon. Having another experimental job for it is something worth
considering as for some reason it is not always easy reproducing the same
failure modes exhibited on the gate.
>
> Regards,
> Salvatore
>

Awesome work, thanks for the update.

> [1] https://review.openstack.org/#/c/58054/
> [2] https://review.openstack.org/#/c/57420/
> [3] https://review.openstack.org/#/c/53459/
> [4] https://review.openstack.org/#/c/58284/
> [5] https://review.openstack.org/#/c/58860/
> [6] https://review.openstack.org/#/c/58597/
> [7] https://review.openstack.org/#/c/58415/
> [8] https://review.openstack.org/#/c/45676/
> [9] https://bugs.launchpad.net/neutron/+bug/1177973
> [10]
https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel&field.tags_combinator=ANY
> [11]
https://blueprints.launchpad.net/neutron/+spec/neutron-tempest-parallel
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev