[Openstack-operators] [manila] manila operator's feedback forum etherpad available

2018-05-17 Thread Goutham Pacha Ravi
Cross posting from Openstack-dev because Tom's unable to post to this
list yet. Manila operators, please note the session at the Forum next
week.

Thanks,
Goutham

-- Forwarded message --
From: Tom Barron 
Date: Thu, May 17, 2018 at 10:57 AM
Subject: [openstack-dev] [manila] manila operator's feedback forum
etherpad available
To: openstack-operators@lists.openstack.org, openstack-...@lists.openstack.org


Next week at the Summit there is a forum session dedicated to Manila
opertors' feedback on Thursday from 1:50-2:30pm [1] for which we have
started an etherpad [2].  Please come and help manila developers do
the right thing!  We're particularly interested in experiences running
the OpenStack share service at scale and overcoming any obstacles to
deployment but are interested in getting any and all feedback from
real deployments so that we can tailor our development and maintenance
efforts to real world needs.

Please feel free and encouraged to add to the etherpad starting now.

See you there!

-- Tom Barron
  Manila PTL
  irc: tbarron

[1] 
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21780/manila-ops-feedback-running-at-scale-overcoming-barriers-to-deployment
[2] https://etherpad.openstack.org/p/YVR18-manila-forum-ops-feedback

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova] FYI on changes that might impact out of tree scheduler filters

2018-05-17 Thread Matt Riedemann
CERN has upgraded to Cells v2 and is doing performance testing of the 
scheduler and were reporting some things today which got us back to this 
bug [1]. So I've starting pushing some patches related to this but also 
related to an older blueprint I created [2]. In summary, we do quite a 
bit of DB work just to load up a list of instance objects per host that 
the in-tree filters don't even use.


The first change [3] is a simple optimization to avoid the default joins 
on the instance_info_caches and security_groups tables. If you have out 
of tree filters that, for whatever reason, rely on the 
HostState.instances objects to have info_cache or security_groups set, 
they'll continue to work, but will have to round-trip to the DB to 
lazy-load the fields, which is going to be a performance penalty on that 
filter. See the change for details.


The second change in the series [4] is more drastic in that we'll do 
away with pulling the full Instance object per host, which means only a 
select set of optional fields can be lazy-loaded [5], and the rest will 
result in an exception. The patch currently has a workaround config 
option to continue doing things the old way if you have out of tree 
filters that rely on this, but for good citizens with only in-tree 
filters, you will get a performance improvement during scheduling.


There are some other things we can do to optimize more of this flow, but 
this email is just about the ones that have patches up right now.


[1] https://bugs.launchpad.net/nova/+bug/1737465
[2] 
https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-info-on-a-diet

[3] https://review.openstack.org/#/c/569218/
[4] https://review.openstack.org/#/c/569247/
[5] 
https://github.com/openstack/nova/blob/de52fefa1fd52ccaac6807e5010c5f2a2dcbaab5/nova/objects/instance.py#L66


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Need feedback for nova aborting cold migration function

2018-05-17 Thread Matt Riedemann

On 5/15/2018 3:48 AM, saga...@nttdata.co.jp wrote:

We store the service logs which are created by VM on that storage.


I don't mean to be glib, but have you considered maybe not doing that?

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread George Mihaiescu
We have other scheduled tests that perform end-to-end (assign floating IP,
ssh, ping outside) and never had an issue.
I think we turned it off because the callback code was initially buggy and
nova would wait forever while things were in fact ok, but I'll  change
"vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run
another large test, just to confirm.

We usually run these large tests after a version upgrade to test the APIs
under load.



On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann 
wrote:

> On 5/17/2018 9:46 AM, George Mihaiescu wrote:
>
>> and large rally tests of 500 instances complete with no issues.
>>
>
> Sure, except you can't ssh into the guests.
>
> The whole reason the vif plugging is fatal and timeout and callback code
> was because the upstream CI was unstable without it. The server would
> report as ACTIVE but the ports weren't wired up so ssh would fail. Having
> an ACTIVE guest that you can't actually do anything with is kind of
> pointless.
>
> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread Matt Riedemann

On 5/17/2018 9:46 AM, George Mihaiescu wrote:

and large rally tests of 500 instances complete with no issues.


Sure, except you can't ssh into the guests.

The whole reason the vif plugging is fatal and timeout and callback code 
was because the upstream CI was unstable without it. The server would 
report as ACTIVE but the ports weren't wired up so ssh would fail. 
Having an ACTIVE guest that you can't actually do anything with is kind 
of pointless.


--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread George Mihaiescu
We use "vif_plugging_is_fatal = False" and "vif_plugging_timeout = 0" as
well as "no-ping" in the dnsmasq-neutron.conf, and large rally tests of 500
instances complete with no issues.

These are some good blogposts about Neutron performance:
https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/
https://www.mirantis.com/blog/improving-dhcp-performance-openstack/

I would run a large rally test like this one and see where time is spent
mostly:
{
"NovaServers.boot_and_delete_server": [
{
"args": {
"flavor": {
"name": "c2.small"
},
"image": {
"name": "^Ubuntu 16.04 - latest$"
},
"force_delete": false
},
"runner": {
"type": "constant",
"times": 500,
"concurrency": 100
}
}
]
}


Cheers,
George

On Thu, May 17, 2018 at 7:49 AM, Radu Popescu | eMAG, Technology <
radu.pope...@emag.ro> wrote:

> Hi,
>
> unfortunately, didn't get the reply in my inbox, so I'm answering from the
> link here:
> http://lists.openstack.org/pipermail/openstack-operators/
> 2018-May/015270.html
> (hopefully, my reply will go to the same thread)
>
> Anyway, I can see the neutron openvswitch agent logs processing the
> interface way after the VM is up (in this case, 30 minutes). And after the
> vif plugin timeout of 5 minutes (currently 10 minutes).
> After searching for logs, I came out with an example here: (replaced nova
> compute hostname with "nova.compute.hostname")
>
> http://paste.openstack.org/show/1VevKuimoBMs4G8X53Eu/
>
> As you can see, the request for the VM starts around 3:27AM. Ports get
> created, openvswitch has the command to do it, has DHCP, but apparently
> Neutron server sends the callback after Neutron Openvswitch agent finishes.
> Callback is at 2018-05-10 03:57:36.177 while Neutron Openvswitch agent says
> it completed the setup and configuration at 2018-05-10 03:57:35.247.
>
> So, my question is, why is Neutron Openvswitch agent processing the
> request 30 minutes after the VM is started? And where can I search for logs
> for whatever happens during those 30 minutes?
> And yes, we're using libvirt. At some point, we added some new nova
> compute nodes and the new ones came with v3.2.0 and was breaking migration
> between hosts. That's why we downgraded (and versionlocked) everything at
> v2.0.0.
>
> Thanks,
> Radu
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread Radu Popescu | eMAG, Technology
Hi,

unfortunately, didn't get the reply in my inbox, so I'm answering from the link 
here:
http://lists.openstack.org/pipermail/openstack-operators/2018-May/015270.html
(hopefully, my reply will go to the same thread)

Anyway, I can see the neutron openvswitch agent logs processing the interface 
way after the VM is up (in this case, 30 minutes). And after the vif plugin 
timeout of 5 minutes (currently 10 minutes).
After searching for logs, I came out with an example here: (replaced nova 
compute hostname with "nova.compute.hostname")

http://paste.openstack.org/show/1VevKuimoBMs4G8X53Eu/

As you can see, the request for the VM starts around 3:27AM. Ports get created, 
openvswitch has the command to do it, has DHCP, but apparently Neutron server 
sends the callback after Neutron Openvswitch agent finishes. Callback is at 
2018-05-10 03:57:36.177 while Neutron Openvswitch agent says it completed the 
setup and configuration at 2018-05-10 03:57:35.247.

So, my question is, why is Neutron Openvswitch agent processing the request 30 
minutes after the VM is started? And where can I search for logs for whatever 
happens during those 30 minutes?
And yes, we're using libvirt. At some point, we added some new nova compute 
nodes and the new ones came with v3.2.0 and was breaking migration between 
hosts. That's why we downgraded (and versionlocked) everything at v2.0.0.

Thanks,
Radu
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators