Re: [Openstack] profiling nova-api
Hi Yun! Thanks very much for sharing this information. Can I ask you to pull the code in this branch: https://review.openstack.org/#change,5762 And retry your test? On my test machines, the work that Huang Zhiteng has done to convert Nova's servers to use multiple operating system processes, each with its own greenthread service pool, resulted in a massive throughput improvement, turning a quickly-bottlenecked system into a server that could scale nearly linearly with the number of worker processes. Be sure to set osapi_workers to the number of cores your machine has... and also note that your code profiling technique is unlikely to be effective since cProfile wouldn't track the forked child worker processes' stacks, AFAIK. Still interested to see if the time to execute the 300 API calls is dramatically reduced, though. Looking forward to any results you might have. Best, -jay On 04/11/2012 04:48 PM, Yun Mao wrote: Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.7250.032 25.7250.032 {method 'poll' of 'select.epoll' objects} 24081.8830.0011.8830.001 {method 'query' of '_mysql.connection' objects} 703801.6670.0007.1870.000 expression.py:2263(corresponding_column) 1350051.2540.0002.4700.000 cfg.py:1058(_get) 410271.0430.0001.9070.000 schema.py:542(__init__) 388021.0080.0001.2190.000 __init__.py:451(format) 1622060.8210.0000.8210.000 util.py:883(values) 15306660.7730.0000.7740.000 {isinstance} 135046/1349600.7160.0001.9190.000 cfg.py:1107(_substitute) 12050.7130.0011.3690.001 base.py:2106(__init__) 1836000.6900.0000.7960.000 interfaces.py:954(_reduce_path) 810020.6870.0002.4920.000 compiler.py:312(visit_label) 388020.6500.0006.0870.000 log.py:227(format) 3192700.6220.0000.7480.000 attributes.py:164(__get__) 890242/8842290.6080.0001.8850.000 {getattr} 405000.6050.0003.1010.000 schema.py:955(_make_proxy) 120783/1207380.6030.0000.6050.000 {method 'sub' of '_sre.SRE_Pattern' objects} 810000.6010.0002.1560.000 interfaces.py:677(create_row_processor) 630000.5900.0000.7070.000 times.py:44(DateTime_or_None) 981020.5880.0000.8860.000 compiler.py:337(visit_column) 6580980.5800.0000.5810.000 {method 'intersection' of 'set' objects} 1098020.5620.0000.5620.000 expression.py:3625(_from_objects) 231610/12020.5510.0005.8130.005 visitors.py:58(_compiler_dispatch) 1440020.5100.0000.6930.000 compiler.py:622(_truncated_identifier) 135005/1349600.4850.0004.8720.000 cfg.py:860(__getattr__) 24080.4630.0001.9420.001 {built-in method fetch_row} 711000.4600.0000.5800.000 strategies.py:121(create_row_processor) 2990310.4370.0000.4370.000 {_codecs.utf_8_decode} 60000.4370.0001.7990.000 models.py:93(iteritems) 36000/90000.4090.0004.7910.001
Re: [Openstack] profiling nova-api
Both the profiling the multi-process work are great - good stuff! Jay: Won't going multi-process just make it harder to profile? I think it's actually a good thing to profile without the multi-process patch, find fix and the bottlenecks in single-request performance, and then use multi-process for great concurrent-request performance? Justin On Thu, Apr 12, 2012 at 6:49 AM, Jay Pipes jaypi...@gmail.com wrote: Hi Yun! Thanks very much for sharing this information. Can I ask you to pull the code in this branch: https://review.openstack.org/#**change,5762https://review.openstack.org/#change,5762 And retry your test? On my test machines, the work that Huang Zhiteng has done to convert Nova's servers to use multiple operating system processes, each with its own greenthread service pool, resulted in a massive throughput improvement, turning a quickly-bottlenecked system into a server that could scale nearly linearly with the number of worker processes. Be sure to set osapi_workers to the number of cores your machine has... and also note that your code profiling technique is unlikely to be effective since cProfile wouldn't track the forked child worker processes' stacks, AFAIK. Still interested to see if the time to execute the 300 API calls is dramatically reduced, though. Looking forward to any results you might have. Best, -jay On 04/11/2012 04:48 PM, Yun Mao wrote: Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_**instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_**instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.7250.032 25.7250.032 {method 'poll' of 'select.epoll' objects} 24081.8830.0011.8830.001 {method 'query' of '_mysql.connection' objects} 703801.6670.0007.1870.000 expression.py:2263(**corresponding_column) 1350051.2540.0002.4700.000 cfg.py:1058(_get) 410271.0430.0001.9070.000 schema.py:542(__init__) 388021.0080.0001.2190.000 __init__.py:451(format) 1622060.8210.0000.8210.000 util.py:883(values) 15306660.7730.0000.7740.000 {isinstance} 135046/1349600.7160.0001.9190.000 cfg.py:1107(_substitute) 12050.7130.0011.3690.001 base.py:2106(__init__) 1836000.6900.0000.7960.000 interfaces.py:954(_reduce_* *path) 810020.6870.0002.4920.000 compiler.py:312(visit_label) 388020.6500.0006.0870.000 log.py:227(format) 3192700.6220.0000.7480.000 attributes.py:164(__get__) 890242/8842290.6080.0001.8850.000 {getattr} 405000.6050.0003.1010.000 schema.py:955(_make_proxy) 120783/1207380.6030.0000.6050.000 {method 'sub' of '_sre.SRE_Pattern' objects} 810000.6010.0002.1560.000 interfaces.py:677(create_row_**processor) 630000.5900.0000.7070.000 times.py:44(DateTime_or_None) 981020.5880.0000.8860.000 compiler.py:337(visit_column) 6580980.5800.0000.5810.000 {method 'intersection' of 'set' objects} 1098020.5620.0000.5620.000 expression.py:3625(_from_** objects) 231610/12020.5510.0005.8130.005
Re: [Openstack] profiling nova-api
On 04/12/2012 10:39 AM, Justin Santa Barbara wrote: Both the profiling the multi-process work are great - good stuff! Jay: Won't going multi-process just make it harder to profile? Yep, that's why I said also note that your code profiling technique is unlikely to be effective since cProfile wouldn't track the forked child worker processes' stacks ;) I think it's actually a good thing to profile without the multi-process patch, find fix and the bottlenecks in single-request performance, and then use multi-process for great concurrent-request performance? Agreed! -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] profiling nova-api
Just a small warning: since cProfile has its own overhead on entering/exiting every method and uses deterministic rather than statistical profiling, it may misrepresent tiny functions. They will take much more wall-clock time simply due to the profiling overhead * number of executions. I'm not saying cfg is not slow, but in this case it may be worth confirming the actual time spend on each call by doing a minimalistic test with timeit, without any profiler (request the same variable 100 times and see how much it really takes per iteration). Regards, Stanisław Pitucha Cloud Services Hewlett Packard -Original Message- From: openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net [mailto:openstack-bounces+stanislaw.pitucha=hp@lists.launchpad.net] On Behalf Of Yun Mao Sent: Wednesday, April 11, 2012 9:48 PM To: openstack@lists.launchpad.net Subject: [Openstack] profiling nova-api Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.7250.032 25.7250.032 {method 'poll' of 'select.epoll' objects} 24081.8830.0011.8830.001 {method 'query' of '_mysql.connection' objects} 703801.6670.0007.1870.000 expression.py:2263(corresponding_column) 1350051.2540.0002.4700.000 cfg.py:1058(_get) 410271.0430.0001.9070.000 schema.py:542(__init__) 388021.0080.0001.2190.000 __init__.py:451(format) 1622060.8210.0000.8210.000 util.py:883(values) 15306660.7730.0000.7740.000 {isinstance} 135046/1349600.7160.0001.9190.000 cfg.py:1107(_substitute) 12050.7130.0011.3690.001 base.py:2106(__init__) 1836000.6900.0000.7960.000 interfaces.py:954(_reduce_path) 810020.6870.0002.4920.000 compiler.py:312(visit_label) 388020.6500.0006.0870.000 log.py:227(format) 3192700.6220.0000.7480.000 attributes.py:164(__get__) 890242/8842290.6080.0001.8850.000 {getattr} 405000.6050.0003.1010.000 schema.py:955(_make_proxy) 120783/1207380.6030.0000.6050.000 {method 'sub' of '_sre.SRE_Pattern' objects} 810000.6010.0002.1560.000 interfaces.py:677(create_row_processor) 630000.5900.0000.7070.000 times.py:44(DateTime_or_None) 981020.5880.0000.8860.000 compiler.py:337(visit_column) 6580980.5800.0000.5810.000 {method 'intersection' of 'set' objects} 1098020.5620.0000.5620.000 expression.py:3625(_from_objects) 231610/12020.5510.0005.8130.005 visitors.py:58(_compiler_dispatch) 1440020.5100.0000.6930.000 compiler.py:622(_truncated_identifier) 135005/1349600.4850.0004.8720.000 cfg.py:860(__getattr__) 24080.4630.0001.9420.001 {built-in method fetch_row} 711000.4600.0000.5800.000 strategies.py:121(create_row_processor) 2990310.4370.0000.4370.000 {_codecs.utf_8_decode} 60000.4370.0001.7990.000 models.py:93(iteritems) 36000/90000.4090.0004.7910.001 mapper.py:2146(populate_state) 810020.3930.0001.1040.000
Re: [Openstack] profiling nova-api
Hi Jay, I will take a look later when I find a box with multiple cores to take advantage it.. Agreed - cprofiling is not so useful this case. This would be a purely performance benchmark. If implemented correctly, we should see a notable gain. However, It will probably not that linear because we might hit keystone's performance issue later. It looks like the second CPU hogger behind nova-api. Is there a multiprocess keystone too? Yun On Thu, Apr 12, 2012 at 9:49 AM, Jay Pipes jaypi...@gmail.com wrote: Hi Yun! Thanks very much for sharing this information. Can I ask you to pull the code in this branch: https://review.openstack.org/#change,5762 And retry your test? On my test machines, the work that Huang Zhiteng has done to convert Nova's servers to use multiple operating system processes, each with its own greenthread service pool, resulted in a massive throughput improvement, turning a quickly-bottlenecked system into a server that could scale nearly linearly with the number of worker processes. Be sure to set osapi_workers to the number of cores your machine has... and also note that your code profiling technique is unlikely to be effective since cProfile wouldn't track the forked child worker processes' stacks, AFAIK. Still interested to see if the time to execute the 300 API calls is dramatically reduced, though. Looking forward to any results you might have. Best, -jay On 04/11/2012 04:48 PM, Yun Mao wrote: Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.725 0.032 25.725 0.032 {method 'poll' of 'select.epoll' objects} 2408 1.883 0.001 1.883 0.001 {method 'query' of '_mysql.connection' objects} 70380 1.667 0.000 7.187 0.000 expression.py:2263(corresponding_column) 135005 1.254 0.000 2.470 0.000 cfg.py:1058(_get) 41027 1.043 0.000 1.907 0.000 schema.py:542(__init__) 38802 1.008 0.000 1.219 0.000 __init__.py:451(format) 162206 0.821 0.000 0.821 0.000 util.py:883(values) 1530666 0.773 0.000 0.774 0.000 {isinstance} 135046/134960 0.716 0.000 1.919 0.000 cfg.py:1107(_substitute) 1205 0.713 0.001 1.369 0.001 base.py:2106(__init__) 183600 0.690 0.000 0.796 0.000 interfaces.py:954(_reduce_path) 81002 0.687 0.000 2.492 0.000 compiler.py:312(visit_label) 38802 0.650 0.000 6.087 0.000 log.py:227(format) 319270 0.622 0.000 0.748 0.000 attributes.py:164(__get__) 890242/884229 0.608 0.000 1.885 0.000 {getattr} 40500 0.605 0.000 3.101 0.000 schema.py:955(_make_proxy) 120783/120738 0.603 0.000 0.605 0.000 {method 'sub' of '_sre.SRE_Pattern' objects} 81000 0.601 0.000 2.156 0.000 interfaces.py:677(create_row_processor) 63000 0.590 0.000 0.707 0.000 times.py:44(DateTime_or_None) 98102 0.588 0.000 0.886 0.000 compiler.py:337(visit_column) 658098 0.580 0.000 0.581 0.000 {method 'intersection' of 'set' objects} 109802 0.562 0.000 0.562 0.000 expression.py:3625(_from_objects) 231610/1202 0.551 0.000 5.813 0.005
Re: [Openstack] profiling nova-api
Awesome, great stuff to know! On 4/11/12 1:48 PM, Yun Mao yun...@gmail.com wrote: Hi Stackers, I spent some time looking at nova-api today. Setup: everything-on-one-node devstack, essex trunk. I setup 1 user with 10 tiny VMs. Client: 3 python threads each doing a loop of nova list equivalent for 100 times. So 300 API calls with concurrency=3. how to profile: python -m cProfile -s time /opt/stack/nova/bin/nova-api --flagfile=/etc/nova/nova.conf --logdir=/var/log/nova --nodebug The partial output is attached in the end. Observations: * It takes about 60 seconds to finish. CPU of nova-api is around 70% to 90% * Database access: Each nova list API call will issue 4 db APIs: 3 instance_get_all_by_filters(), 1 instance_fault_get_by_instance_uuids(), so 1200 db API calls total (note: not necessarily 1200 SQL statements, could be more). The 900 instance_get_all_by_filters() calls took 30.2 seconds (i.e. 0.03s each)! The 300 instance_fault_get_by_instance_uuids() calls only took 1.129 seconds (0.004 each). You might think: MySQL sucks. Not so fast. Remember this is a tiny database with only 10 VMs. Profile also shows that the actual _mysql.connection.query() method only took 1.883 seconds in total. So, we pretty much spend 29 seconds out of 60 seconds doing either sqlalchemy stuff or our own wrapper. You can also see from the sheer volume of sqlalchemy library calls involved. * the cfg.py library inefficiency. During 300 API calls, common.cfg.ConfigOpts._get() is called 135005 times! and we paid 2.470 sec for that. Hopefully this is useful for whoever wants to improve the performance of nova-api. Thanks, Yun === 23355694 function calls (22575841 primitive calls) in 77.874 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 812 25.7250.032 25.7250.032 {method 'poll' of 'select.epoll' objects} 24081.8830.0011.8830.001 {method 'query' of '_mysql.connection' objects} 703801.6670.0007.1870.000 expression.py:2263(corresponding_column) 1350051.2540.0002.4700.000 cfg.py:1058(_get) 410271.0430.0001.9070.000 schema.py:542(__init__) 388021.0080.0001.2190.000 __init__.py:451(format) 1622060.8210.0000.8210.000 util.py:883(values) 15306660.7730.0000.7740.000 {isinstance} 135046/1349600.7160.0001.9190.000 cfg.py:1107(_substitute) 12050.7130.0011.3690.001 base.py:2106(__init__) 1836000.6900.0000.7960.000 interfaces.py:954(_reduce_path) 810020.6870.0002.4920.000 compiler.py:312(visit_label) 388020.6500.0006.0870.000 log.py:227(format) 3192700.6220.0000.7480.000 attributes.py:164(__get__) 890242/8842290.6080.0001.8850.000 {getattr} 405000.6050.0003.1010.000 schema.py:955(_make_proxy) 120783/1207380.6030.0000.6050.000 {method 'sub' of '_sre.SRE_Pattern' objects} 810000.6010.0002.1560.000 interfaces.py:677(create_row_processor) 630000.5900.0000.7070.000 times.py:44(DateTime_or_None) 981020.5880.0000.8860.000 compiler.py:337(visit_column) 6580980.5800.0000.5810.000 {method 'intersection' of 'set' objects} 1098020.5620.0000.5620.000 expression.py:3625(_from_objects) 231610/12020.5510.0005.8130.005 visitors.py:58(_compiler_dispatch) 1440020.5100.0000.6930.000 compiler.py:622(_truncated_identifier) 135005/1349600.4850.0004.8720.000 cfg.py:860(__getattr__) 24080.4630.0001.9420.001 {built-in method fetch_row} 711000.4600.0000.5800.000 strategies.py:121(create_row_processor) 2990310.4370.0000.4370.000 {_codecs.utf_8_decode} 60000.4370.0001.7990.000 models.py:93(iteritems) 36000/90000.4090.0004.7910.001 mapper.py:2146(populate_state) 810020.3930.0001.1040.000 compiler.py:672(label_select_column) 45000/90000.3900.0005.1480.001 mapper.py:2186(_instance) 12020.3660.0005.7970.005 compiler.py:701(visit_select) 231610/12020.3470.0005.8170.005 base.py:714(process) 1728000.3410.0001.1480.000 interfaces.py:651(_get_context_strategy) 258680.3390.0000.3390.000 {method 'write' of 'file' objects} 91800/522000.3270.0006.7280.000 interfaces.py:673(setup) 228010.3240.0000.3240.000 {method 'strftime' of 'datetime.date' objects} 784780.3160.0000.9140.000 expression.py:2143(contains_column) 258680.3140.0000.3140.000 {method 'flush' of 'file' objects} 2990330.3090.0000.9300.000 {method 'decode' of 'str' objects}