Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
On 11/02/2016 08:51 AM, Major Hayden wrote: > At this point, I'm still trying to test some additional theories. Does anyone > have any other ideas? Here's an update for today. There are a few bugs open now: OpenStack-Ansible bug: https://bugs.launchpad.net/openstack-ansible/+bug/1637494 Ubuntu python2.7 bug: https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1638695 The suggestion from the python2.7 bug is to compile python 2.7.12 with gcc-4.8 on 16.04 to see if the performance issue is related to GCC. I haven't had a chance to test that out yet, but if someone else has a moment to try it, I'd be much obliged. ;) There is also a private bug opened with Canonical that has been escalated as part of my company's support contract with Canonical. I'll provide relevant updates from that bug when I get them. -- Major Hayden signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
> On 11/2/16, 1:51 PM, "Major Hayden" wrote: >I tossed up a horribly written hack[0] to change some CPU scheduler > settings back to the Trusty settings. My initial tests were great! Also, > the first test in OpenStack CI was really good -- 62 minutes for trusty and > 65 minutes for xenial. However, that seems to be a fluke since the second > test had a 30 minute gap between the test durations. :( I think that difference was due to the hardware/contention profiles of the different nodepool providers. You’ll have to do tests somewhere we you can execute on a consistent hardware profile, ideally with no other contention on the host, in order to get reliable comparisons. I think Logan may be able to help with that. Alternatively perhaps you can get access to an OSIC host or instance for testing? Rackspace Limited is a company registered in England & Wales (company registered number 03897010) whose registered office is at 5 Millington Road, Hyde Park Hayes, Middlesex UB3 4AZ. Rackspace Limited privacy policy can be viewed at www.rackspace.co.uk/legal/privacy-policy - This e-mail message may contain confidential or privileged information intended for the recipient. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com and delete the original message. Your cooperation is appreciated. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
On 10/28/2016 04:02 AM, Major Hayden wrote: > On the topic of threads, the sysbench output from both Trusty and Xenial are > nearly identical with the exception of threads. Trusty is usually about > 15-20% faster on that benchmark than Xenial. I spoke with a few other people and it seems like the culprit could be a CPU scheduler difference and/or a glibc change. After messing around with perf for a long time, I found that context switches and CPU migrations were slightly higher on Xenial than Trusty, but by a negligible amount (< 10% at worst). I tossed up a horribly written hack[0] to change some CPU scheduler settings back to the Trusty settings. My initial tests were great! Also, the first test in OpenStack CI was really good -- 62 minutes for trusty and 65 minutes for xenial. However, that seems to be a fluke since the second test had a 30 minute gap between the test durations. :( Those scheduler changes for busy_factor, min_interval, and max_interval appear to have been made in the upstream Linux kernel, and they're present on various distributions like Ubuntu, CentOS, and Fedora. At this point, I'm still trying to test some additional theories. Does anyone have any other ideas? [0] https://review.openstack.org/392316 -- Major Hayden signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
On 10/28/2016 10:17 AM, Major Hayden wrote: >> Also, when running the tests on both systems, track cpu usage and number >> > of threads to see if one has more restrictions than the other. > Almost no difference here. On the topic of threads, the sysbench output from both Trusty and Xenial are nearly identical with the exception of threads. Trusty is usually about 15-20% faster on that benchmark than Xenial. That leads me to rule out a few things: 1) It's probably not python that is slow since it affects sysbench, too 2) The kernel version doesn't seem to make a difference 3) The way python was compiled doesn't matter (I tried pyenv) 4) Kernel tunables (via sysctl) look very similar, especially with regard to threads I also ran the full suite of tests from nova and got these results: Trusty: 375 seconds Xenial: 531 seconds -- Major Hayden signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
On 10/28/2016 01:44 AM, Mike Carden wrote: > I bounced this off my 'distro differences' goto guy, Chris Smart. Here are > his thoughts: > > "Run the 14.04 kernel on 16.04 system and re-run the tests to see if it's > kernel related. > > If 16.04 userland with 14.04 kernel is as fast as Ubuntu 14.04, then > compare the kernel .config files to see if there were major changes, > like switching out schedulers. 14.04 with 16.04's kernel is actually just a small amount (~ 3-5%) faster than 14.04 with its standard kernel. > Also, when running the tests on both systems, track cpu usage and number > of threads to see if one has more restrictions than the other. Almost no difference here. > Check swappiness and also "vmstat 1" to see if you're getting more pages > swapped in and out in 16.04. No difference here, either. > I'm assuming that the two virtual machines are identical (CPU type, memory, > threads, virtio, etc)." They are! We've seen this occur in the OpenStack CI jobs (with KVM), and I've also tested this with Xen and bare metal. -- Major Hayden signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate
Major, I bounced this off my 'distro differences' goto guy, Chris Smart. Here are his thoughts: "Run the 14.04 kernel on 16.04 system and re-run the tests to see if it's kernel related. If 16.04 userland with 14.04 kernel is as fast as Ubuntu 14.04, then compare the kernel .config files to see if there were major changes, like switching out schedulers. Also, when running the tests on both systems, track cpu usage and number of threads to see if one has more restrictions than the other. Check swappiness and also "vmstat 1" to see if you're getting more pages swapped in and out in 16.04. I'm assuming that the two virtual machines are identical (CPU type, memory, threads, virtio, etc)." -- MC __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [openstack-ansible] Debugging slow Xenial gate
Hey there, We've talked about the slow Xenial gate during the OpenStack Summit this week and I decided to do a little digging. I built two quick test instances: one with Trusty and the other with Xenial. Trusty comes with python 2.7.6 and Xenial has 2.7.12. Here are the initial comparisons: https://gist.github.com/major/20d7d11442685355c30d0abf0c07be98 The worst test shows that 2.7.12 on Xenial is 1.88 slower than 2.7.6 on Trusty. Wow. I compiled 2.7.12 from source on Xenial to see if it's a packaging issue, but that didn't change anything much. I then compiled 2.7.12 on 14.04 and found it be to be slightly slower than 2.7.6 on 14.04, but faster than 2.7.12 on 16.04. That's confusing, so here's a ranking from fastest to slowest performance: 1) 2.7.6 on Ubuntu 14.04 (fastest) 2) 2.7.12 compiled from source on Ubuntu 14.04 (a little slower than #1) 3) 2.7.12 compiled from source on Ubuntu 16.04 (slightly faster than #4) 4) 2.7.12 on Ubuntu 16.04 (significant slower than #1) It's evident that 2.7.12 is a little bit slower, but something in Ubuntu 16.04 makes it much worse. I checked sysctl settings and the only big difference was the max threads per process (16.04 was about half of 14.04). I set them both to the same value but the performance testing didn't change. Does anyone else have any ideas of what might be causing this? -- Major Hayden signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev