Re: [openstack-dev] In memory joins in Nova

Mike Bayer Wed, 12 Aug 2015 08:01:34 -0700


On 8/11/15 7:14 PM, Sachin Manpathak wrote:

I am struggling with python code profiling in general. It has its owncaveats like 100% plus overhead.However, on a host with only nova services (DB on a different host), Isee cpu utilization spike up quickly with scale. The DB server isrelatively calm and never goes over 20%. On a system which relies onDB to fetch all the data, this should not happen.

The DB's resources are intended to scale up in response to wide degreeof concurrency, that is, lots and lots of API services all hitting itfrom many concurrent API calls. "with scale" here is a slipperyterm. What kind of concurrency are you testing with ? How many CPUsserving API calls are utilized simultaneously? To saturate thedatabase you need many dozens, and even then you don't want yourdatabase CPU going very high. 20% does not seem that low to me,actually. I disagree with the concept that high database CPU refersto a performant application, or that DB saturation is a requirement inorder for a database-delivered application to be performant; I think theopposite is true. In web application development, when I worked withproduction sites at high volume, the goal was to use enough caching sothat major site pages being viewed constantly could be delivered with*no* database access whatsoever. We wanted to see the majority of thesite being sent to customers with the database at essentially zero; thisis how you get page response times down from 200-300 ms down to 20 or30. If you want to measure performance, looking at API responsetime is probably better than looking at CPU utilization first.

That said, Python is a very CPU intensive language, because it is aninterpreted scripting language. Operations that in a language likecompiled C would be hardly a whisper of CPU end up being majoroperations in Python. Openstack suffers from a large amount offunction call overhead even for simple API operations, as it is anextremely layered system with very little use of caching. Until itmoves to a JIT-based interpreter like Pypy that can flatten outcall-chains, the amount of overhead just for an API call to come in andgo back out with a response will remain significant. As for caching,making use of a technique such as memcached caching of data structurescan also greatly improve performance because we can cache pre-assembleddata, removing the need to repeatedly extract it from multiple tables tobe pieced together in Python, which is also a very CPU intensiveactivity. This is something that will be happening more in the future,but as it improves the performance of Openstack, it will be removingeven more load from the database. Again, I'd look at API response timesas the first thing to measure.

That said, certainly the joining of data in Python may be unnecessaryand I'm not sure if we can't revisit the history Dan refers to when hesays there were "very large result sets", if we are referring to numberof rows, joining in SQL or in Python will still involve the same numberof "rows", and SQLAlchemy also offers many techniques of optimizing theoverhead of fetching lots of rows which Nova currently doesn't make useof (seehttps://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuningfor a primer on this).

If OTOH we are referring to the width of the columns and the join issuch that you're going to get the same A identity over and over again,if you join A and B you get a "wide" row with all of A and B with a verylarge amount of redundant data sent over the wire again and again (notethat the database drivers available to us in Python always send all rowsand columns over the wire unconditionally, whether or not we fetch themin application code). In this case you *do* want to do the join inPython to some extent, though you use the database to deliver thesimplest information possible to work with first; you get the full rowfor all of the A entries, then a second query for all of B plus A'sprimary key that can be quickly matched to that of A. SQLAlchemyoffers this as "subquery eager loading" and it is definitely much moreperformant than a single full join when you have wide rows forindividual entities. The database is doing the join to the extentthat it can deliver the primary key information for A and B which can beoperated upon very quickly in memory, as we already have all the Aidentities in a hash lookup in any case.

Overall if you're looking to make Openstack faster, where you want to beis 1. what is the response time of an API call and 2. what do the Pythonprofiles look like for those API calls? For a primer on Pythonprofiling see for example my own FAQ entry here:http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling.This kind of profiling is a lot of work and is very tedious, compared tojust running a big rally job and looking at the CPU overhead.Unfortunately this is the only way one can get actual meaningfulinformation as to why a Python application is slow. All othertechniques offer us basically nothing as to explaining *why* somethingis slow.

I could not find any analysis of nova performance either. Appreciateif someone can point me to one.


Thanks,

On Tue, Aug 11, 2015 at 3:57 PM, Chris Friesen<chris.frie...@windriver.com <mailto:chris.frie...@windriver.com>> wrote:


    Just curious...have you measured this consuming a significant
    amount of CPU time?  Or is it more a gut feel of "this looks like
    it might be expensive"?

    Chris


    On 08/11/2015 04:51 PM, Sachin Manpathak wrote:

        Here are a few --
        instance_get_all_by_filters joins manually with
        instances_fill_metadata --
        
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
        
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782

        Almost all instance query functions manually join with
        instance_metadata.

        Another example was compute_node_get_all function which joined
        compute_node,
        services and ip tables. But it is simplified  in current
        codebase (I am working
        on Juno)




        On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum <cl...@fewbar.com
        <mailto:cl...@fewbar.com>
        <mailto:cl...@fewbar.com <mailto:cl...@fewbar.com>>> wrote:

            Excerpts from Sachin Manpathak's message of 2015-08-12
        05:40:36 +0800:
            > Hi folks,
            > Nova codebase seems to follow manual joins model where
        all data required by
            > an API is fetched from multiple tables and then joined
        manually by using
            > (in most cases) python dictionary lookups.
            >
            > I was wondering about the basis reasoning for doing so.
        I usually find
            > openstack services to be CPU bound in a medium sized
        environment and
            > non-trivial utilization seems to be from parts of code
        which do manual
            > joins.

            Could you please cite specific examples so we can follow
        along with your
            thinking without having to repeat your analysis?

            Thanks!

        
__________________________________________________________________________
            OpenStack Development Mailing List (not for usage questions)
            Unsubscribe:
        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>

        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




        
__________________________________________________________________________
        OpenStack Development Mailing List (not for usage questions)
        Unsubscribe:
        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] In memory joins in Nova

Reply via email to