On 8/11/15 7:14 PM, Sachin Manpathak wrote:
I am struggling with python code profiling in general. It has its own caveats like 100% plus overhead. However, on a host with only nova services (DB on a different host), I see cpu utilization spike up quickly with scale. The DB server is relatively calm and never goes over 20%. On a system which relies on DB to fetch all the data, this should not happen.
The DB's resources are intended to scale up in response to wide degree of concurrency, that is, lots and lots of API services all hitting it from many concurrent API calls. "with scale" here is a slippery term. What kind of concurrency are you testing with ? How many CPUs serving API calls are utilized simultaneously? To saturate the database you need many dozens, and even then you don't want your database CPU going very high. 20% does not seem that low to me, actually. I disagree with the concept that high database CPU refers to a performant application, or that DB saturation is a requirement in order for a database-delivered application to be performant; I think the opposite is true. In web application development, when I worked with production sites at high volume, the goal was to use enough caching so that major site pages being viewed constantly could be delivered with *no* database access whatsoever. We wanted to see the majority of the site being sent to customers with the database at essentially zero; this is how you get page response times down from 200-300 ms down to 20 or 30. If you want to measure performance, looking at API response time is probably better than looking at CPU utilization first.

That said, Python is a very CPU intensive language, because it is an interpreted scripting language. Operations that in a language like compiled C would be hardly a whisper of CPU end up being major operations in Python. Openstack suffers from a large amount of function call overhead even for simple API operations, as it is an extremely layered system with very little use of caching. Until it moves to a JIT-based interpreter like Pypy that can flatten out call-chains, the amount of overhead just for an API call to come in and go back out with a response will remain significant. As for caching, making use of a technique such as memcached caching of data structures can also greatly improve performance because we can cache pre-assembled data, removing the need to repeatedly extract it from multiple tables to be pieced together in Python, which is also a very CPU intensive activity. This is something that will be happening more in the future, but as it improves the performance of Openstack, it will be removing even more load from the database. Again, I'd look at API response times as the first thing to measure.

That said, certainly the joining of data in Python may be unnecessary and I'm not sure if we can't revisit the history Dan refers to when he says there were "very large result sets", if we are referring to number of rows, joining in SQL or in Python will still involve the same number of "rows", and SQLAlchemy also offers many techniques of optimizing the overhead of fetching lots of rows which Nova currently doesn't make use of (see https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning for a primer on this).

If OTOH we are referring to the width of the columns and the join is such that you're going to get the same A identity over and over again, if you join A and B you get a "wide" row with all of A and B with a very large amount of redundant data sent over the wire again and again (note that the database drivers available to us in Python always send all rows and columns over the wire unconditionally, whether or not we fetch them in application code). In this case you *do* want to do the join in Python to some extent, though you use the database to deliver the simplest information possible to work with first; you get the full row for all of the A entries, then a second query for all of B plus A's primary key that can be quickly matched to that of A. SQLAlchemy offers this as "subquery eager loading" and it is definitely much more performant than a single full join when you have wide rows for individual entities. The database is doing the join to the extent that it can deliver the primary key information for A and B which can be operated upon very quickly in memory, as we already have all the A identities in a hash lookup in any case.

Overall if you're looking to make Openstack faster, where you want to be is 1. what is the response time of an API call and 2. what do the Python profiles look like for those API calls? For a primer on Python profiling see for example my own FAQ entry here: http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling. This kind of profiling is a lot of work and is very tedious, compared to just running a big rally job and looking at the CPU overhead. Unfortunately this is the only way one can get actual meaningful information as to why a Python application is slow. All other techniques offer us basically nothing as to explaining *why* something is slow.




I could not find any analysis of nova performance either. Appreciate if someone can point me to one.

Thanks,





On Tue, Aug 11, 2015 at 3:57 PM, Chris Friesen <chris.frie...@windriver.com <mailto:chris.frie...@windriver.com>> wrote:

    Just curious...have you measured this consuming a significant
    amount of CPU time?  Or is it more a gut feel of "this looks like
    it might be expensive"?

    Chris


    On 08/11/2015 04:51 PM, Sachin Manpathak wrote:

        Here are a few --
        instance_get_all_by_filters joins manually with
        instances_fill_metadata --
        
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
        
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782

        Almost all instance query functions manually join with
        instance_metadata.

        Another example was compute_node_get_all function which joined
        compute_node,
        services and ip tables. But it is simplified  in current
        codebase (I am working
        on Juno)




        On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum <cl...@fewbar.com
        <mailto:cl...@fewbar.com>
        <mailto:cl...@fewbar.com <mailto:cl...@fewbar.com>>> wrote:

            Excerpts from Sachin Manpathak's message of 2015-08-12
        05:40:36 +0800:
            > Hi folks,
            > Nova codebase seems to follow manual joins model where
        all data required by
            > an API is fetched from multiple tables and then joined
        manually by using
            > (in most cases) python dictionary lookups.
            >
            > I was wondering about the basis reasoning for doing so.
        I usually find
            > openstack services to be CPU bound in a medium sized
        environment and
            > non-trivial utilization seems to be from parts of code
        which do manual
            > joins.

            Could you please cite specific examples so we can follow
        along with your
            thinking without having to repeat your analysis?

            Thanks!

        
__________________________________________________________________________
            OpenStack Development Mailing List (not for usage questions)
            Unsubscribe:
        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




        
__________________________________________________________________________
        OpenStack Development Mailing List (not for usage questions)
        Unsubscribe:
        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to