On 03/06/2018 09:36 PM, Alex Xu wrote:
2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com <mailto:sou...@gmail.com>>:

    2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com

        __ __

        __ __

        *From:*Matthew Booth [mailto:mbo...@redhat.com
        *Sent:* Saturday, March 3, 2018 4:15 PM
        *To:* OpenStack Development Mailing List (not for usage
        questions) <openstack-dev@lists.openstack.org
        *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple

        __ __

        On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com
        <mailto:jaypi...@gmail.com>> wrote:____

            On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____

                Hello Nova team,

                      During the Cyborg discussion at Rocky PTG, we
                proposed a flow for FPGAs wherein the request spec asks
                for a device type as a resource class, and optionally a
                function (such as encryption) in the extra specs. This
                does not seem to work well for the usage model that I’ll
                describe below.

                An FPGA device may implement more than one function. For
                example, it may implement both compression and
                encryption. Say a cluster has 10 devices of device type
                X, and each of them is programmed to offer 2 instances
                of function A and 4 instances of function B. More
                specifically, the device may implement 6 PCI functions,
                with 2 of them tied to function A, and the other 4 tied
                to function B. So, we could have 6 separate instances
                accessing functions on the same device.____

        __ __

        Does this imply that Cyborg can't reprogram the FPGA at all?____

        */[Mooney, Sean K] cyborg is intended to support fixed function
        acclerators also so it will not always be able to program the
        accelerator. In this case where an fpga is preprogramed with a
        multi function bitstream that is statically provisioned cyborge
        will not be able to reprogram the slot if any of the fuctions
        from that slot are already allocated to an instance. In this
        case it will have to treat it like a fixed function device and
        simply allocate a unused  vf  of the corret type if available.


                In the current flow, the device type X is modeled as a
                resource class, so Placement will count how many of them
                are in use. A flavor for ‘RC device-type-X + function A’
                will consume one instance of the RC device-type-X.  But
                this is not right because this precludes other functions
                on the same device instance from getting used.

                One way to solve this is to declare functions A and B as
                resource classes themselves and have the flavor request
                the function RC. Placement will then correctly count the
                function instances. However, there is still a problem:
                if the requested function A is not available, Placement
                will return an empty list of RPs, but we need some way
                to reprogram some device to create an instance of
                function A.____

            Clearly, nova is not going to be reprogramming devices with
            an instance of a particular function.

            Cyborg might need to have a separate agent that listens to
            the nova notifications queue and upon seeing an event that
            indicates a failed build due to lack of resources, then
            Cyborg can try and reprogram a device and then try
            rebuilding the original request.____

        __ __

        It was my understanding from that discussion that we intend to
        insert Cyborg into the spawn workflow for device configuration
        in the same way that we currently insert resources provided by
        Cinder and Neutron. So while Nova won't be reprogramming a
        device, it will be calling out to Cyborg to reprogram a device,
        and waiting while that happens.____

        My understanding is (and I concede some areas are a little

        * The flavors says device type X with function Y____

        * Placement tells us everywhere with device type X____

        * A weigher orders these by devices which already have an
        available function Y (where is this metadata stored?)____

        * Nova schedules to host Z____

        * Nova host Z asks cyborg for a local function Y and blocks____

           * Cyborg hopefully returns function Y which is already

           * If not, Cyborg reprograms a function Y, then returns it____

        Can anybody correct me/fill in the gaps?____

        */[Mooney, Sean K] that correlates closely to my recollection
        also. As for the metadata I think the weigher may need to call
        to cyborg to retrieve this as it will not be available in the
        host state object./*

    Is it the nova scheduler weigher or we want to support weigh on
    placement? Function is traits as I think, so can we have
    preferred_traits? I remember we talk about that parameter in the
    past, but we don't have good use-case at that time. This is good

If we call the Cyborg from the nova scheduler weigher, that will slow down the scheduling a lot also.

Right, which is why I don't want to do any weighing in Placement at all. If folks want to sort by things that require long-running code/callbacks or silly temporal things like metrics, they can do that in a custom weigher in the nova-scheduler and take the performance hit there.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to