Re: [openstack-dev] In memory joins in Nova
On 8/11/15 7:14 PM, Sachin Manpathak wrote: I am struggling with python code profiling in general. It has its own caveats like 100% plus overhead. However, on a host with only nova services (DB on a different host), I see cpu utilization spike up quickly with scale. The DB server is relatively calm and never goes over 20%. On a system which relies on DB to fetch all the data, this should not happen. The DB's resources are intended to scale up in response to wide degree of concurrency, that is, lots and lots of API services all hitting it from many concurrent API calls.with scale here is a slippery term. What kind of concurrency are you testing with ? How many CPUs serving API calls are utilized simultaneously? To saturate the database you need many dozens, and even then you don't want your database CPU going very high. 20% does not seem that low to me, actually.I disagree with the concept that high database CPU refers to a performant application, or that DB saturation is a requirement in order for a database-delivered application to be performant; I think the opposite is true. In web application development, when I worked with production sites at high volume, the goal was to use enough caching so that major site pages being viewed constantly could be delivered with *no* database access whatsoever. We wanted to see the majority of the site being sent to customers with the database at essentially zero; this is how you get page response times down from 200-300 ms down to 20 or 30. If you want to measure performance, looking at API response time is probably better than looking at CPU utilization first. That said, Python is a very CPU intensive language, because it is an interpreted scripting language. Operations that in a language like compiled C would be hardly a whisper of CPU end up being major operations in Python. Openstack suffers from a large amount of function call overhead even for simple API operations, as it is an extremely layered system with very little use of caching. Until it moves to a JIT-based interpreter like Pypy that can flatten out call-chains, the amount of overhead just for an API call to come in and go back out with a response will remain significant. As for caching, making use of a technique such as memcached caching of data structures can also greatly improve performance because we can cache pre-assembled data, removing the need to repeatedly extract it from multiple tables to be pieced together in Python, which is also a very CPU intensive activity. This is something that will be happening more in the future, but as it improves the performance of Openstack, it will be removing even more load from the database. Again, I'd look at API response times as the first thing to measure. That said, certainly the joining of data in Python may be unnecessary and I'm not sure if we can't revisit the history Dan refers to when he says there were very large result sets, if we are referring to number of rows, joining in SQL or in Python will still involve the same number of rows, and SQLAlchemy also offers many techniques of optimizing the overhead of fetching lots of rows which Nova currently doesn't make use of (see https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning for a primer on this). If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). In this case you *do* want to do the join in Python to some extent, though you use the database to deliver the simplest information possible to work with first; you get the full row for all of the A entries, then a second query for all of B plus A's primary key that can be quickly matched to that of A.SQLAlchemy offers this as subquery eager loading and it is definitely much more performant than a single full join when you have wide rows for individual entities.The database is doing the join to the extent that it can deliver the primary key information for A and B which can be operated upon very quickly in memory, as we already have all the A identities in a hash lookup in any case. Overall if you're looking to make Openstack faster, where you want to be is 1. what is the response time of an API call and 2. what do the Python profiles look like for those API calls? For a primer on Python profiling see for example my own FAQ entry here: http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling. This kind of profiling is a lot of work and is very tedious, compared to just running a big rally job
Re: [openstack-dev] In memory joins in Nova
If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In this case you *do* want to do the join in Python to some extent, though you use the database to deliver the simplest information possible to work with first; you get the full row for all of the A entries, then a second query for all of B plus A's primary key that can be quickly matched to that of A. This is what we're doing. Fetch the list of instances that match the filters, then for the ones that were returned, get their metadata. --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
In the past I've taken a different approach to problematic one to many relationships and have made the metadata a binary JSON blob. Is there some reason that won't work? We have done that for various pieces of data that were previously in system_metadata. Where this breaks down is if you need to be able to select instances based on keys in the metadata blob, which we do in various scheduling operations (certainly for aggregate metadata, at least). I *believe* we have to leave metadata as row-based for that reason (although honestly I don't remember the details), and probably system_metadata as well, but I'd have to survey what is left in there. Since the metadata is nearly always queried as a whole, this seems like a valid approach that would keep DB traffic low but also ease the burden of reassembling the collection in nova-api. 'Nearly' being the key word there. We just got done moving all of the flavor information we used to stash in system_metadata to a JSON blob in the database. That cuts 10-30 rows of system_metadata for each instance, depending on the state, and gives us a thing we can selectively join with instance for a single load with little overhead. We might be able to get away with going back to fully joining system_metadata given the reduction in size, but we honestly don't even need to query it as often after the flavor-ectomy, so I'm not sure it's worth it. Further, after the explosion of system_metadata which caused us to stop joining it in the first place, it was realized that a user could generate a lot of traffic by exhausting their quota of metadata items (which they control), so we probably want to join user metadata in python anyway for that reason. So I guess the summary is: I think with flavor data out of the path, the major offender is gone, such that this becomes extremely low on the priority list. --Dan signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Excerpts from Mike Bayer's message of 2015-08-13 11:03:32 +0800: On 8/12/15 10:29 PM, Clint Byrum wrote: Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800: If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In the past I've taken a different approach to problematic one to many relationships and have made the metadata a binary JSON blob. Is there some reason that won't work? Of course, this type of thing can run into concurrency issues on update, but these can be handled by SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata is nearly always queried as a whole, this seems like a valid approach that would keep DB traffic low but also ease the burden of reassembling the collection in nova-api. JSON blobs have the disadvantages that you are piggybacking an entirely different storage model on top of the relational one, losing all the features you might like about the relational model like rich datatypes (I understand our JSON decoders trip up on plain datetimes?), insert defaults, nullability constraints, a fixed, predefined schema that can be altered in a controlled, all-or-nothing way, efficient storage characteristics, and of course reasonable querying capabilities. They are useful IMO only for small sections of data that are amenable to ad-hoc changes in schema like simple bags of key-value pairs containing miscellaneous features. Agreed on all points!. And metadata for instances is exactly that: a simple bag of key/value strings that is almost always queried and delivered as a whole. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800: If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In the past I've taken a different approach to problematic one to many relationships and have made the metadata a binary JSON blob. Is there some reason that won't work? Of course, this type of thing can run into concurrency issues on update, but these can be handled by SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata is nearly always queried as a whole, this seems like a valid approach that would keep DB traffic low but also ease the burden of reassembling the collection in nova-api. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
On 8/12/15 10:29 PM, Clint Byrum wrote: Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800: If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In the past I've taken a different approach to problematic one to many relationships and have made the metadata a binary JSON blob. Is there some reason that won't work? Of course, this type of thing can run into concurrency issues on update, but these can be handled by SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata is nearly always queried as a whole, this seems like a valid approach that would keep DB traffic low but also ease the burden of reassembling the collection in nova-api. JSON blobs have the disadvantages that you are piggybacking an entirely different storage model on top of the relational one, losing all the features you might like about the relational model like rich datatypes (I understand our JSON decoders trip up on plain datetimes?), insert defaults, nullability constraints, a fixed, predefined schema that can be altered in a controlled, all-or-nothing way, efficient storage characteristics, and of course reasonable querying capabilities. They are useful IMO only for small sections of data that are amenable to ad-hoc changes in schema like simple bags of key-value pairs containing miscellaneous features. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
On 8/12/15 1:49 PM, Sachin Manpathak wrote: Thanks, This feedback was helpful. Perhaps my paraphrasing was misleading. I am not running openstack at scale in order to see how much the DB can sustain. My observation was that the host running nova services saturates on CPU much earlier than the DB does. You absolutely *want* a single host to be saturated *way* before the database is; the database here is a single vertical service intended to serve hundreds or thousands of horizontally scaled clients simultaneously.A single request at a time should not even be a blip in the database's view of things. Joins could be one of the reasons. I also observed that background tasks like instance creation, resource/stats updates contend with get queries. In addition to caching optimizations prioritizing tasks in nova could help. Is there a nova API to fetch list of instances without metadata? Until I find a good way to profile openstack code, changing the queries can be a good experiement IMO. On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith d...@danplanet.com mailto:d...@danplanet.com wrote: If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In this case you *do* want to do the join in Python to some extent, though you use the database to deliver the simplest information possible to work with first; you get the full row for all of the A entries, then a second query for all of B plus A's primary key that can be quickly matched to that of A. This is what we're doing. Fetch the list of instances that match the filters, then for the ones that were returned, get their metadata. --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Thanks, This feedback was helpful. Perhaps my paraphrasing was misleading. I am not running openstack at scale in order to see how much the DB can sustain. My observation was that the host running nova services saturates on CPU much earlier than the DB does. Joins could be one of the reasons. I also observed that background tasks like instance creation, resource/stats updates contend with get queries. In addition to caching optimizations prioritizing tasks in nova could help. Is there a nova API to fetch list of instances without metadata? Until I find a good way to profile openstack code, changing the queries can be a good experiement IMO. On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith d...@danplanet.com wrote: If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a wide row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). Yep, it was this. N instances times M rows of metadata each. If you pull 100 instances and they each have 30 rows of system metadata, that's a lot of data, and most of it is the instance being repeated 30 times for each metadata row. When we first released code doing this, a prominent host immediately raised the red flag because their DB traffic shot through the roof. In this case you *do* want to do the join in Python to some extent, though you use the database to deliver the simplest information possible to work with first; you get the full row for all of the A entries, then a second query for all of B plus A's primary key that can be quickly matched to that of A. This is what we're doing. Fetch the list of instances that match the filters, then for the ones that were returned, get their metadata. --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Here are a few -- instance_get_all_by_filters joins manually with instances_fill_metadata -- https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782 Almost all instance query functions manually join with instance_metadata. Another example was compute_node_get_all function which joined compute_node, services and ip tables. But it is simplified in current codebase (I am working on Juno) On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800: Hi folks, Nova codebase seems to follow manual joins model where all data required by an API is fetched from multiple tables and then joined manually by using (in most cases) python dictionary lookups. I was wondering about the basis reasoning for doing so. I usually find openstack services to be CPU bound in a medium sized environment and non-trivial utilization seems to be from parts of code which do manual joins. Could you please cite specific examples so we can follow along with your thinking without having to repeat your analysis? Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Just curious...have you measured this consuming a significant amount of CPU time? Or is it more a gut feel of this looks like it might be expensive? Chris On 08/11/2015 04:51 PM, Sachin Manpathak wrote: Here are a few -- instance_get_all_by_filters joins manually with instances_fill_metadata -- https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782 Almost all instance query functions manually join with instance_metadata. Another example was compute_node_get_all function which joined compute_node, services and ip tables. But it is simplified in current codebase (I am working on Juno) On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com mailto:cl...@fewbar.com wrote: Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800: Hi folks, Nova codebase seems to follow manual joins model where all data required by an API is fetched from multiple tables and then joined manually by using (in most cases) python dictionary lookups. I was wondering about the basis reasoning for doing so. I usually find openstack services to be CPU bound in a medium sized environment and non-trivial utilization seems to be from parts of code which do manual joins. Could you please cite specific examples so we can follow along with your thinking without having to repeat your analysis? Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800: Hi folks, Nova codebase seems to follow manual joins model where all data required by an API is fetched from multiple tables and then joined manually by using (in most cases) python dictionary lookups. I was wondering about the basis reasoning for doing so. I usually find openstack services to be CPU bound in a medium sized environment and non-trivial utilization seems to be from parts of code which do manual joins. Could you please cite specific examples so we can follow along with your thinking without having to repeat your analysis? Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
Here are a few -- instance_get_all_by_filters joins manually with instances_fill_metadata -- https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782 Almost all instance query functions manually join with instance_metadata. This was done because joining metadata with instances was resulting in very large result sets from the database. Instance objects are very large, and each can have many rows in the metadata tables (especially in older versions of nova). When joining these tables and doing queries for all instances on a host or across a time interval, the result sets are very large (we had a big issue with this at the grizzly release, as reported by real deployments). So, we're trading some overhead (and atomicity) for significantly lower DB traffic. I don't think this is likely to be the source of your measured CPU overhead though. The metadata joining is O(n) where n is the number of instances in the result. I would think that is much smaller than the overhead of processing the result of the query into ORM objects. --Dan signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] In memory joins in Nova
I am struggling with python code profiling in general. It has its own caveats like 100% plus overhead. However, on a host with only nova services (DB on a different host), I see cpu utilization spike up quickly with scale. The DB server is relatively calm and never goes over 20%. On a system which relies on DB to fetch all the data, this should not happen. I could not find any analysis of nova performance either. Appreciate if someone can point me to one. Thanks, On Tue, Aug 11, 2015 at 3:57 PM, Chris Friesen chris.frie...@windriver.com wrote: Just curious...have you measured this consuming a significant amount of CPU time? Or is it more a gut feel of this looks like it might be expensive? Chris On 08/11/2015 04:51 PM, Sachin Manpathak wrote: Here are a few -- instance_get_all_by_filters joins manually with instances_fill_metadata -- https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782 Almost all instance query functions manually join with instance_metadata. Another example was compute_node_get_all function which joined compute_node, services and ip tables. But it is simplified in current codebase (I am working on Juno) On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com mailto:cl...@fewbar.com wrote: Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800: Hi folks, Nova codebase seems to follow manual joins model where all data required by an API is fetched from multiple tables and then joined manually by using (in most cases) python dictionary lookups. I was wondering about the basis reasoning for doing so. I usually find openstack services to be CPU bound in a medium sized environment and non-trivial utilization seems to be from parts of code which do manual joins. Could you please cite specific examples so we can follow along with your thinking without having to repeat your analysis? Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev