Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-20 Thread 少合冯
2018-03-07 10:36 GMT+08:00 Alex Xu <sou...@gmail.com>:

>
>
> 2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com>:
>
>>
>>
>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com>:
>>
>>>
>>>
>>>
>>>
>>> *From:* Matthew Booth [mailto:mbo...@redhat.com]
>>> *Sent:* Saturday, March 3, 2018 4:15 PM
>>> *To:* OpenStack Development Mailing List (not for usage questions) <
>>> openstack-dev@lists.openstack.org>
>>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>> functions
>>>
>>>
>>>
>>> On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com> wrote:
>>>
>>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>>
>>> Hello Nova team,
>>>
>>>  During the Cyborg discussion at Rocky PTG, we proposed a flow for
>>> FPGAs wherein the request spec asks for a device type as a resource class,
>>> and optionally a function (such as encryption) in the extra specs. This
>>> does not seem to work well for the usage model that I’ll describe below.
>>>
>>> An FPGA device may implement more than one function. For example, it may
>>> implement both compression and encryption. Say a cluster has 10 devices of
>>> device type X, and each of them is programmed to offer 2 instances of
>>> function A and 4 instances of function B. More specifically, the device may
>>> implement 6 PCI functions, with 2 of them tied to function A, and the other
>>> 4 tied to function B. So, we could have 6 separate instances accessing
>>> functions on the same device.
>>>
>>>
>>>
>>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>>
>>> *[Mooney, Sean K] cyborg is intended to support fixed function
>>> acclerators also so it will not always be able to program the accelerator.
>>> In this case where an fpga is preprogramed with a multi function bitstream
>>> that is statically provisioned cyborge will not be able to reprogram the
>>> slot if any of the fuctions from that slot are already allocated to an
>>> instance. In this case it will have to treat it like a fixed function
>>> device and simply allocate a unused  vf  of the corret type if available. *
>>>
>>>
>>>
>>>
>>>
>>> In the current flow, the device type X is modeled as a resource class,
>>> so Placement will count how many of them are in use. A flavor for ‘RC
>>> device-type-X + function A’ will consume one instance of the RC
>>> device-type-X.  But this is not right because this precludes other
>>> functions on the same device instance from getting used.
>>>
>>> One way to solve this is to declare functions A and B as resource
>>> classes themselves and have the flavor request the function RC. Placement
>>> will then correctly count the function instances. However, there is still a
>>> problem: if the requested function A is not available, Placement will
>>> return an empty list of RPs, but we need some way to reprogram some device
>>> to create an instance of function A.
>>>
>>>
>>> Clearly, nova is not going to be reprogramming devices with an instance
>>> of a particular function.
>>>
>>> Cyborg might need to have a separate agent that listens to the nova
>>> notifications queue and upon seeing an event that indicates a failed build
>>> due to lack of resources, then Cyborg can try and reprogram a device and
>>> then try rebuilding the original request.
>>>
>>>
>>>
>>> It was my understanding from that discussion that we intend to insert
>>> Cyborg into the spawn workflow for device configuration in the same way
>>> that we currently insert resources provided by Cinder and Neutron. So while
>>> Nova won't be reprogramming a device, it will be calling out to Cyborg to
>>> reprogram a device, and waiting while that happens.
>>>
>>> My understanding is (and I concede some areas are a little hazy):
>>>
>>> * The flavors says device type X with function Y
>>>
>>> * Placement tells us everywhere with device type X
>>>
>>> * A weigher orders these by devices which already have an available
>>> function Y (where is this metadata stored?)
>>>
>>> * Nova schedules to host Z
>>>
>>> * Nova host Z asks cyborg for a local function Y and blocks
>>>
>>>   * Cyborg hopefully

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-19 Thread Alex Xu
2018-03-19 0:34 GMT+08:00 Nadathur, Sundar :

> Sorry for the delayed response. I broadly agree with previous replies.
> For the concerns about the impact of Cyborg weigher on scheduling
> performance , there are some options (apart from filtering candidates as
> much as possible in Placement):
> * Handle hosts in bulk by extending BaseWeigher
>  and
> overriding weigh_objects
> (),
> instead of handling one host at a time.
>

Still an external REST call, I guess people still doesn't like that.


>
* If we have to handle one host at a time for whatever reason, since the
> weigher is maintained by Cyborg, it could directly query Cyborg DB rather
> than go through Cyborg REST API. This will be not unlike other weighers.
>

That means when the cyborg DB schema changed, we have to restart the
nova-scheduler to update the weigher also. We couple the two service
upgrade together.


> Given these and other possible optimizations, it may be too soon to worry
> about the performance impact.
>

yea, maybe. What about the preferred traits?


>
> I am working on a spec that will capture the flow discussed in the PTG. I
> will try to address these aspects as well.
>
> Thanks & Regards,
> Sundar
>
>
> On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
>
> @jay I'm also against a weigher in nova/placement. This should be an
> optional step depends on vendor implementation, not a default one.
>
> @Alex I think we should explore the idea of preferred trait.
>
> @Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA
> and pre-programed ones.
> Therefore it is correct that in your description, the programming
> operation should be a call from Nova to Cyborg, and cyborg will complete
> the operation while nova waits. The only problem is that the weigher step
> should be an optional one.
>
>
> On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes  wrote:
>
>> On 03/06/2018 09:36 PM, Alex Xu wrote:
>>
>>> 2018-03-07 10:21 GMT+08:00 Alex Xu > sou...@gmail.com>>:
>>>
>>>
>>>
>>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K >> >:
>>>
>>> __ __
>>>
>>> __ __
>>>
>>> *From:*Matthew Booth [mailto:mbo...@redhat.com
>>> ]
>>> *Sent:* Saturday, March 3, 2018 4:15 PM
>>> *To:* OpenStack Development Mailing List (not for usage
>>> questions) >> >
>>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>> functions
>>>
>>> __ __
>>>
>>> On 2 March 2018 at 14:31, Jay Pipes >> > wrote:
>>>
>>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>>
>>> Hello Nova team,
>>>
>>>   During the Cyborg discussion at Rocky PTG, we
>>> proposed a flow for FPGAs wherein the request spec asks
>>> for a device type as a resource class, and optionally a
>>> function (such as encryption) in the extra specs. This
>>> does not seem to work well for the usage model that I’ll
>>> describe below.
>>>
>>> An FPGA device may implement more than one function. For
>>> example, it may implement both compression and
>>> encryption. Say a cluster has 10 devices of device type
>>> X, and each of them is programmed to offer 2 instances
>>> of function A and 4 instances of function B. More
>>> specifically, the device may implement 6 PCI functions,
>>> with 2 of them tied to function A, and the other 4 tied
>>> to function B. So, we could have 6 separate instances
>>> accessing functions on the same device.
>>>
>>> __ __
>>>
>>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>>
>>> */[Mooney, Sean K] cyborg is intended to support fixed function
>>> acclerators also so it will not always be able to program the
>>> accelerator. In this case where an fpga is preprogramed with a
>>> multi function bitstream that is statically provisioned cyborge
>>> will not be able to reprogram the slot if any of the fuctions
>>> from that slot are already allocated to an instance. In this
>>> case it will have to treat it like a fixed function device and
>>> simply allocate a unused  vf  of the corret type if available.
>>> /*
>>>
>>>
>>> 
>>>
>>>
>>> In the current flow, the device type X is modeled as a
>>> resource class, so 

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-19 Thread Zhipeng Huang
Hi Sundar,

I think the two points you raised is valid and please also reflect that in
the spec you are helping drafting :)

On Mon, Mar 19, 2018 at 12:34 AM, Nadathur, Sundar <
sundar.nadat...@intel.com> wrote:

> Sorry for the delayed response. I broadly agree with previous replies.
> For the concerns about the impact of Cyborg weigher on scheduling
> performance , there are some options (apart from filtering candidates as
> much as possible in Placement):
> * Handle hosts in bulk by extending BaseWeigher
>  and
> overriding weigh_objects
> (),
> instead of handling one host at a time.
> * If we have to handle one host at a time for whatever reason, since the
> weigher is maintained by Cyborg, it could directly query Cyborg DB rather
> than go through Cyborg REST API. This will be not unlike other weighers.
>
> Given these and other possible optimizations, it may be too soon to worry
> about the performance impact.
>
> I am working on a spec that will capture the flow discussed in the PTG. I
> will try to address these aspects as well.
>
> Thanks & Regards,
> Sundar
>
>
> On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
>
> @jay I'm also against a weigher in nova/placement. This should be an
> optional step depends on vendor implementation, not a default one.
>
> @Alex I think we should explore the idea of preferred trait.
>
> @Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA
> and pre-programed ones.
> Therefore it is correct that in your description, the programming
> operation should be a call from Nova to Cyborg, and cyborg will complete
> the operation while nova waits. The only problem is that the weigher step
> should be an optional one.
>
>
> On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes  wrote:
>
>> On 03/06/2018 09:36 PM, Alex Xu wrote:
>>
>>> 2018-03-07 10:21 GMT+08:00 Alex Xu > sou...@gmail.com>>:
>>>
>>>
>>>
>>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K >> >:
>>>
>>> __ __
>>>
>>> __ __
>>>
>>> *From:*Matthew Booth [mailto:mbo...@redhat.com
>>> ]
>>> *Sent:* Saturday, March 3, 2018 4:15 PM
>>> *To:* OpenStack Development Mailing List (not for usage
>>> questions) >> >
>>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>> functions
>>>
>>> __ __
>>>
>>> On 2 March 2018 at 14:31, Jay Pipes >> > wrote:
>>>
>>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>>
>>> Hello Nova team,
>>>
>>>   During the Cyborg discussion at Rocky PTG, we
>>> proposed a flow for FPGAs wherein the request spec asks
>>> for a device type as a resource class, and optionally a
>>> function (such as encryption) in the extra specs. This
>>> does not seem to work well for the usage model that I’ll
>>> describe below.
>>>
>>> An FPGA device may implement more than one function. For
>>> example, it may implement both compression and
>>> encryption. Say a cluster has 10 devices of device type
>>> X, and each of them is programmed to offer 2 instances
>>> of function A and 4 instances of function B. More
>>> specifically, the device may implement 6 PCI functions,
>>> with 2 of them tied to function A, and the other 4 tied
>>> to function B. So, we could have 6 separate instances
>>> accessing functions on the same device.
>>>
>>> __ __
>>>
>>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>>
>>> */[Mooney, Sean K] cyborg is intended to support fixed function
>>> acclerators also so it will not always be able to program the
>>> accelerator. In this case where an fpga is preprogramed with a
>>> multi function bitstream that is statically provisioned cyborge
>>> will not be able to reprogram the slot if any of the fuctions
>>> from that slot are already allocated to an instance. In this
>>> case it will have to treat it like a fixed function device and
>>> simply allocate a unused  vf  of the corret type if available.
>>> /*
>>>
>>>
>>> 
>>>
>>>
>>> In the current flow, the device type X is modeled as a
>>> resource class, so Placement will count how many of them
>>> are in use. A flavor for ‘RC device-type-X + function A’
>>> will consume 

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-18 Thread Nadathur, Sundar

Sorry for the delayed response. I broadly agree with previous replies.

For the concerns about the impact of Cyborg weigher on scheduling 
performance , there are some options (apart from filtering candidates as 
much as possible in Placement):
* Handle hosts in bulk by extending BaseWeigher 
 and 
overriding weigh_objects 
(), 
instead of handling one host at a time.
* If we have to handle one host at a time for whatever reason, since the 
weigher is maintained by Cyborg, it could directly query Cyborg DB 
rather than go through Cyborg REST API. This will be not unlike other 
weighers.


Given these and other possible optimizations, it may be too soon to 
worry about the performance impact.


I am working on a spec that will capture the flow discussed in the PTG. 
I will try to address these aspects as well.


Thanks & Regards,
Sundar

On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
@jay I'm also against a weigher in nova/placement. This should be an 
optional step depends on vendor implementation, not a default one.


@Alex I think we should explore the idea of preferred trait.

@Mathew: Like Sean said, Cyborg wants to support both reprogrammable 
FPGA and pre-programed ones.
Therefore it is correct that in your description, the programming 
operation should be a call from Nova to Cyborg, and cyborg will 
complete the operation while nova waits. The only problem is that the 
weigher step should be an optional one.



On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes > wrote:


On 03/06/2018 09:36 PM, Alex Xu wrote:

2018-03-07 10:21 GMT+08:00 Alex Xu  >>:



    2018-03-06 22:45 GMT+08:00 Mooney, Sean K

    >>:

        __ __

        __ __

        *From:*Matthew Booth [mailto:mbo...@redhat.com

        >]
        *Sent:* Saturday, March 3, 2018 4:15 PM
        *To:* OpenStack Development Mailing List (not for usage
        questions) 
        >>
        *Subject:* Re: [openstack-dev] [Nova] [Cyborg]
Tracking multiple
        functions

        __ __

        On 2 March 2018 at 14:31, Jay Pipes

        >> wrote:

            On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:

                Hello Nova team,

                  During the Cyborg discussion at Rocky
PTG, we
                proposed a flow for FPGAs wherein the request
spec asks
                for a device type as a resource class, and
optionally a
                function (such as encryption) in the extra
specs. This
                does not seem to work well for the usage model
that I’ll
                describe below.

                An FPGA device may implement more than one
function. For
                example, it may implement both compression and
                encryption. Say a cluster has 10 devices of
device type
                X, and each of them is programmed to offer 2
instances
                of function A and 4 instances of function B. More
                specifically, the device may implement 6 PCI
functions,
                with 2 of them tied to function A, and the
other 4 tied
                to function B. So, we could have 6 separate
instances
                accessing functions on the same device.

        __ __

        Does this imply that Cyborg can't reprogram the FPGA
at all?

        */[Mooney, Sean K] cyborg is intended to support fixed
function
        acclerators also so it will not always be able to
program the
        accelerator. In this case where an fpga is
preprogramed with a
        multi function bitstream that is statically
provisioned cyborge
        will not be able to reprogram the slot if any of the
fuctions
        from that slot are already allocated to an instance.
In this
        case it will have to 

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-08 Thread Zhipeng Huang
@jay I'm also against a weigher in nova/placement. This should be an
optional step depends on vendor implementation, not a default one.

@Alex I think we should explore the idea of preferred trait.

@Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA
and pre-programed ones.
Therefore it is correct that in your description, the programming operation
should be a call from Nova to Cyborg, and cyborg will complete the
operation while nova waits. The only problem is that the weigher step
should be an optional one.


On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes  wrote:

> On 03/06/2018 09:36 PM, Alex Xu wrote:
>
>> 2018-03-07 10:21 GMT+08:00 Alex Xu  sou...@gmail.com>>:
>>
>>
>>
>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K > >:
>>
>> __ __
>>
>> __ __
>>
>> *From:*Matthew Booth [mailto:mbo...@redhat.com
>> ]
>> *Sent:* Saturday, March 3, 2018 4:15 PM
>> *To:* OpenStack Development Mailing List (not for usage
>> questions) > >
>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>> functions
>>
>> __ __
>>
>> On 2 March 2018 at 14:31, Jay Pipes > > wrote:
>>
>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>
>> Hello Nova team,
>>
>>   During the Cyborg discussion at Rocky PTG, we
>> proposed a flow for FPGAs wherein the request spec asks
>> for a device type as a resource class, and optionally a
>> function (such as encryption) in the extra specs. This
>> does not seem to work well for the usage model that I’ll
>> describe below.
>>
>> An FPGA device may implement more than one function. For
>> example, it may implement both compression and
>> encryption. Say a cluster has 10 devices of device type
>> X, and each of them is programmed to offer 2 instances
>> of function A and 4 instances of function B. More
>> specifically, the device may implement 6 PCI functions,
>> with 2 of them tied to function A, and the other 4 tied
>> to function B. So, we could have 6 separate instances
>> accessing functions on the same device.
>>
>> __ __
>>
>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>
>> */[Mooney, Sean K] cyborg is intended to support fixed function
>> acclerators also so it will not always be able to program the
>> accelerator. In this case where an fpga is preprogramed with a
>> multi function bitstream that is statically provisioned cyborge
>> will not be able to reprogram the slot if any of the fuctions
>> from that slot are already allocated to an instance. In this
>> case it will have to treat it like a fixed function device and
>> simply allocate a unused  vf  of the corret type if available.
>> /*
>>
>>
>> 
>>
>>
>> In the current flow, the device type X is modeled as a
>> resource class, so Placement will count how many of them
>> are in use. A flavor for ‘RC device-type-X + function A’
>> will consume one instance of the RC device-type-X.  But
>> this is not right because this precludes other functions
>> on the same device instance from getting used.
>>
>> One way to solve this is to declare functions A and B as
>> resource classes themselves and have the flavor request
>> the function RC. Placement will then correctly count the
>> function instances. However, there is still a problem:
>> if the requested function A is not available, Placement
>> will return an empty list of RPs, but we need some way
>> to reprogram some device to create an instance of
>> function A.
>>
>>
>> Clearly, nova is not going to be reprogramming devices with
>> an instance of a particular function.
>>
>> Cyborg might need to have a separate agent that listens to
>> the nova notifications queue and upon seeing an event that
>> indicates a failed build due to lack of resources, then
>> Cyborg can try and reprogram a device and then try
>> rebuilding the original request.
>>
>> __ __
>>
>> It was my understanding from that discussion that we intend to
>> insert Cyborg into the spawn workflow for 

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-07 Thread Jay Pipes

On 03/06/2018 09:36 PM, Alex Xu wrote:
2018-03-07 10:21 GMT+08:00 Alex Xu >:




2018-03-06 22:45 GMT+08:00 Mooney, Sean K >:

__ __

__ __

*From:*Matthew Booth [mailto:mbo...@redhat.com
]
*Sent:* Saturday, March 3, 2018 4:15 PM
*To:* OpenStack Development Mailing List (not for usage
questions) >
*Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
functions

__ __

On 2 March 2018 at 14:31, Jay Pipes > wrote:

On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:

Hello Nova team,

  During the Cyborg discussion at Rocky PTG, we
proposed a flow for FPGAs wherein the request spec asks
for a device type as a resource class, and optionally a
function (such as encryption) in the extra specs. This
does not seem to work well for the usage model that I’ll
describe below.

An FPGA device may implement more than one function. For
example, it may implement both compression and
encryption. Say a cluster has 10 devices of device type
X, and each of them is programmed to offer 2 instances
of function A and 4 instances of function B. More
specifically, the device may implement 6 PCI functions,
with 2 of them tied to function A, and the other 4 tied
to function B. So, we could have 6 separate instances
accessing functions on the same device.

__ __

Does this imply that Cyborg can't reprogram the FPGA at all?

*/[Mooney, Sean K] cyborg is intended to support fixed function
acclerators also so it will not always be able to program the
accelerator. In this case where an fpga is preprogramed with a
multi function bitstream that is statically provisioned cyborge
will not be able to reprogram the slot if any of the fuctions
from that slot are already allocated to an instance. In this
case it will have to treat it like a fixed function device and
simply allocate a unused  vf  of the corret type if available.
/*





In the current flow, the device type X is modeled as a
resource class, so Placement will count how many of them
are in use. A flavor for ‘RC device-type-X + function A’
will consume one instance of the RC device-type-X.  But
this is not right because this precludes other functions
on the same device instance from getting used.

One way to solve this is to declare functions A and B as
resource classes themselves and have the flavor request
the function RC. Placement will then correctly count the
function instances. However, there is still a problem:
if the requested function A is not available, Placement
will return an empty list of RPs, but we need some way
to reprogram some device to create an instance of
function A.


Clearly, nova is not going to be reprogramming devices with
an instance of a particular function.

Cyborg might need to have a separate agent that listens to
the nova notifications queue and upon seeing an event that
indicates a failed build due to lack of resources, then
Cyborg can try and reprogram a device and then try
rebuilding the original request.

__ __

It was my understanding from that discussion that we intend to
insert Cyborg into the spawn workflow for device configuration
in the same way that we currently insert resources provided by
Cinder and Neutron. So while Nova won't be reprogramming a
device, it will be calling out to Cyborg to reprogram a device,
and waiting while that happens.

My understanding is (and I concede some areas are a little
hazy):

* The flavors says device type X with function Y

* Placement tells us everywhere with device type X

* A weigher orders these by devices which already have an
available function Y (where is this metadata stored?)

* Nova schedules to host Z

* Nova host Z asks cyborg for a local function Y and blocks

   * Cyborg hopefully returns function Y which is already
available

   * If not, Cyborg reprograms 

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-06 Thread Alex Xu
2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com>:

>
>
> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com>:
>
>>
>>
>>
>>
>> *From:* Matthew Booth [mailto:mbo...@redhat.com]
>> *Sent:* Saturday, March 3, 2018 4:15 PM
>> *To:* OpenStack Development Mailing List (not for usage questions) <
>> openstack-dev@lists.openstack.org>
>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>> functions
>>
>>
>>
>> On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com> wrote:
>>
>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>
>> Hello Nova team,
>>
>>  During the Cyborg discussion at Rocky PTG, we proposed a flow for
>> FPGAs wherein the request spec asks for a device type as a resource class,
>> and optionally a function (such as encryption) in the extra specs. This
>> does not seem to work well for the usage model that I’ll describe below.
>>
>> An FPGA device may implement more than one function. For example, it may
>> implement both compression and encryption. Say a cluster has 10 devices of
>> device type X, and each of them is programmed to offer 2 instances of
>> function A and 4 instances of function B. More specifically, the device may
>> implement 6 PCI functions, with 2 of them tied to function A, and the other
>> 4 tied to function B. So, we could have 6 separate instances accessing
>> functions on the same device.
>>
>>
>>
>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>
>> *[Mooney, Sean K] cyborg is intended to support fixed function
>> acclerators also so it will not always be able to program the accelerator.
>> In this case where an fpga is preprogramed with a multi function bitstream
>> that is statically provisioned cyborge will not be able to reprogram the
>> slot if any of the fuctions from that slot are already allocated to an
>> instance. In this case it will have to treat it like a fixed function
>> device and simply allocate a unused  vf  of the corret type if available. *
>>
>>
>>
>>
>>
>> In the current flow, the device type X is modeled as a resource class, so
>> Placement will count how many of them are in use. A flavor for ‘RC
>> device-type-X + function A’ will consume one instance of the RC
>> device-type-X.  But this is not right because this precludes other
>> functions on the same device instance from getting used.
>>
>> One way to solve this is to declare functions A and B as resource classes
>> themselves and have the flavor request the function RC. Placement will then
>> correctly count the function instances. However, there is still a problem:
>> if the requested function A is not available, Placement will return an
>> empty list of RPs, but we need some way to reprogram some device to create
>> an instance of function A.
>>
>>
>> Clearly, nova is not going to be reprogramming devices with an instance
>> of a particular function.
>>
>> Cyborg might need to have a separate agent that listens to the nova
>> notifications queue and upon seeing an event that indicates a failed build
>> due to lack of resources, then Cyborg can try and reprogram a device and
>> then try rebuilding the original request.
>>
>>
>>
>> It was my understanding from that discussion that we intend to insert
>> Cyborg into the spawn workflow for device configuration in the same way
>> that we currently insert resources provided by Cinder and Neutron. So while
>> Nova won't be reprogramming a device, it will be calling out to Cyborg to
>> reprogram a device, and waiting while that happens.
>>
>> My understanding is (and I concede some areas are a little hazy):
>>
>> * The flavors says device type X with function Y
>>
>> * Placement tells us everywhere with device type X
>>
>> * A weigher orders these by devices which already have an available
>> function Y (where is this metadata stored?)
>>
>> * Nova schedules to host Z
>>
>> * Nova host Z asks cyborg for a local function Y and blocks
>>
>>   * Cyborg hopefully returns function Y which is already available
>>
>>   * If not, Cyborg reprograms a function Y, then returns it
>>
>> Can anybody correct me/fill in the gaps?
>>
>> *[Mooney, Sean K] that correlates closely to my recollection also. As for
>> the metadata I think the weigher may need to call to cyborg to retrieve
>> this as it will not be available in the host state object.*
>>
> Is it the no

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-06 Thread Alex Xu
2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com>:

>
>
>
>
> *From:* Matthew Booth [mailto:mbo...@redhat.com]
> *Sent:* Saturday, March 3, 2018 4:15 PM
> *To:* OpenStack Development Mailing List (not for usage questions) <
> openstack-dev@lists.openstack.org>
> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions
>
>
>
> On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com> wrote:
>
> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>
> Hello Nova team,
>
>  During the Cyborg discussion at Rocky PTG, we proposed a flow for
> FPGAs wherein the request spec asks for a device type as a resource class,
> and optionally a function (such as encryption) in the extra specs. This
> does not seem to work well for the usage model that I’ll describe below.
>
> An FPGA device may implement more than one function. For example, it may
> implement both compression and encryption. Say a cluster has 10 devices of
> device type X, and each of them is programmed to offer 2 instances of
> function A and 4 instances of function B. More specifically, the device may
> implement 6 PCI functions, with 2 of them tied to function A, and the other
> 4 tied to function B. So, we could have 6 separate instances accessing
> functions on the same device.
>
>
>
> Does this imply that Cyborg can't reprogram the FPGA at all?
>
> *[Mooney, Sean K] cyborg is intended to support fixed function acclerators
> also so it will not always be able to program the accelerator. In this case
> where an fpga is preprogramed with a multi function bitstream that is
> statically provisioned cyborge will not be able to reprogram the slot if
> any of the fuctions from that slot are already allocated to an instance. In
> this case it will have to treat it like a fixed function device and simply
> allocate a unused  vf  of the corret type if available. *
>
>
>
>
>
> In the current flow, the device type X is modeled as a resource class, so
> Placement will count how many of them are in use. A flavor for ‘RC
> device-type-X + function A’ will consume one instance of the RC
> device-type-X.  But this is not right because this precludes other
> functions on the same device instance from getting used.
>
> One way to solve this is to declare functions A and B as resource classes
> themselves and have the flavor request the function RC. Placement will then
> correctly count the function instances. However, there is still a problem:
> if the requested function A is not available, Placement will return an
> empty list of RPs, but we need some way to reprogram some device to create
> an instance of function A.
>
>
> Clearly, nova is not going to be reprogramming devices with an instance of
> a particular function.
>
> Cyborg might need to have a separate agent that listens to the nova
> notifications queue and upon seeing an event that indicates a failed build
> due to lack of resources, then Cyborg can try and reprogram a device and
> then try rebuilding the original request.
>
>
>
> It was my understanding from that discussion that we intend to insert
> Cyborg into the spawn workflow for device configuration in the same way
> that we currently insert resources provided by Cinder and Neutron. So while
> Nova won't be reprogramming a device, it will be calling out to Cyborg to
> reprogram a device, and waiting while that happens.
>
> My understanding is (and I concede some areas are a little hazy):
>
> * The flavors says device type X with function Y
>
> * Placement tells us everywhere with device type X
>
> * A weigher orders these by devices which already have an available
> function Y (where is this metadata stored?)
>
> * Nova schedules to host Z
>
> * Nova host Z asks cyborg for a local function Y and blocks
>
>   * Cyborg hopefully returns function Y which is already available
>
>   * If not, Cyborg reprograms a function Y, then returns it
>
> Can anybody correct me/fill in the gaps?
>
> *[Mooney, Sean K] that correlates closely to my recollection also. As for
> the metadata I think the weigher may need to call to cyborg to retrieve
> this as it will not be available in the host state object.*
>
Is it the nova scheduler weigher or we want to support weigh on placement?
Function is traits as I think, so can we have preferred_traits? I remember
we talk about that parameter in the past, but we don't have good use-case
at that time. This is good use-case.


> Matt
>
>
>
> --
>
> Matthew Booth
>
> Red Hat OpenStack Engineer, Compute DFG
>
>
>
> Phone: +442070094448 <+44%2020%207009%204448> (UK)
>
>
>
> __

Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-06 Thread Mooney, Sean K


From: Matthew Booth [mailto:mbo...@redhat.com]
Sent: Saturday, March 3, 2018 4:15 PM
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

On 2 March 2018 at 14:31, Jay Pipes 
<jaypi...@gmail.com<mailto:jaypi...@gmail.com>> wrote:
On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
Hello Nova team,

 During the Cyborg discussion at Rocky PTG, we proposed a flow for FPGAs 
wherein the request spec asks for a device type as a resource class, and 
optionally a function (such as encryption) in the extra specs. This does not 
seem to work well for the usage model that I’ll describe below.

An FPGA device may implement more than one function. For example, it may 
implement both compression and encryption. Say a cluster has 10 devices of 
device type X, and each of them is programmed to offer 2 instances of function 
A and 4 instances of function B. More specifically, the device may implement 6 
PCI functions, with 2 of them tied to function A, and the other 4 tied to 
function B. So, we could have 6 separate instances accessing functions on the 
same device.

Does this imply that Cyborg can't reprogram the FPGA at all?
[Mooney, Sean K] cyborg is intended to support fixed function acclerators also 
so it will not always be able to program the accelerator. In this case where an 
fpga is preprogramed with a multi function bitstream that is statically 
provisioned cyborge will not be able to reprogram the slot if any of the 
fuctions from that slot are already allocated to an instance. In this case it 
will have to treat it like a fixed function device and simply allocate a unused 
 vf  of the corret type if available.



In the current flow, the device type X is modeled as a resource class, so 
Placement will count how many of them are in use. A flavor for ‘RC 
device-type-X + function A’ will consume one instance of the RC device-type-X.  
But this is not right because this precludes other functions on the same device 
instance from getting used.

One way to solve this is to declare functions A and B as resource classes 
themselves and have the flavor request the function RC. Placement will then 
correctly count the function instances. However, there is still a problem: if 
the requested function A is not available, Placement will return an empty list 
of RPs, but we need some way to reprogram some device to create an instance of 
function A.

Clearly, nova is not going to be reprogramming devices with an instance of a 
particular function.

Cyborg might need to have a separate agent that listens to the nova 
notifications queue and upon seeing an event that indicates a failed build due 
to lack of resources, then Cyborg can try and reprogram a device and then try 
rebuilding the original request.

It was my understanding from that discussion that we intend to insert Cyborg 
into the spawn workflow for device configuration in the same way that we 
currently insert resources provided by Cinder and Neutron. So while Nova won't 
be reprogramming a device, it will be calling out to Cyborg to reprogram a 
device, and waiting while that happens.
My understanding is (and I concede some areas are a little hazy):
* The flavors says device type X with function Y
* Placement tells us everywhere with device type X
* A weigher orders these by devices which already have an available function Y 
(where is this metadata stored?)
* Nova schedules to host Z
* Nova host Z asks cyborg for a local function Y and blocks
  * Cyborg hopefully returns function Y which is already available
  * If not, Cyborg reprograms a function Y, then returns it
Can anybody correct me/fill in the gaps?
[Mooney, Sean K] that correlates closely to my recollection also. As for the 
metadata I think the weigher may need to call to cyborg to retrieve this as it 
will not be available in the host state object.
Matt


--
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-03 Thread Matthew Booth
On 2 March 2018 at 14:31, Jay Pipes  wrote:

> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>
>> Hello Nova team,
>>
>>  During the Cyborg discussion at Rocky PTG, we proposed a flow for
>> FPGAs wherein the request spec asks for a device type as a resource class,
>> and optionally a function (such as encryption) in the extra specs. This
>> does not seem to work well for the usage model that I’ll describe below.
>>
>> An FPGA device may implement more than one function. For example, it may
>> implement both compression and encryption. Say a cluster has 10 devices of
>> device type X, and each of them is programmed to offer 2 instances of
>> function A and 4 instances of function B. More specifically, the device may
>> implement 6 PCI functions, with 2 of them tied to function A, and the other
>> 4 tied to function B. So, we could have 6 separate instances accessing
>> functions on the same device.
>>
>
Does this imply that Cyborg can't reprogram the FPGA at all?


>
>> In the current flow, the device type X is modeled as a resource class, so
>> Placement will count how many of them are in use. A flavor for ‘RC
>> device-type-X + function A’ will consume one instance of the RC
>> device-type-X.  But this is not right because this precludes other
>> functions on the same device instance from getting used.
>>
>> One way to solve this is to declare functions A and B as resource classes
>> themselves and have the flavor request the function RC. Placement will then
>> correctly count the function instances. However, there is still a problem:
>> if the requested function A is not available, Placement will return an
>> empty list of RPs, but we need some way to reprogram some device to create
>> an instance of function A.
>>
>
> Clearly, nova is not going to be reprogramming devices with an instance of
> a particular function.
>
> Cyborg might need to have a separate agent that listens to the nova
> notifications queue and upon seeing an event that indicates a failed build
> due to lack of resources, then Cyborg can try and reprogram a device and
> then try rebuilding the original request.
>

It was my understanding from that discussion that we intend to insert
Cyborg into the spawn workflow for device configuration in the same way
that we currently insert resources provided by Cinder and Neutron. So while
Nova won't be reprogramming a device, it will be calling out to Cyborg to
reprogram a device, and waiting while that happens.

My understanding is (and I concede some areas are a little hazy):

* The flavors says device type X with function Y
* Placement tells us everywhere with device type X
* A weigher orders these by devices which already have an available
function Y (where is this metadata stored?)
* Nova schedules to host Z
* Nova host Z asks cyborg for a local function Y and blocks
  * Cyborg hopefully returns function Y which is already available
  * If not, Cyborg reprograms a function Y, then returns it

Can anybody correct me/fill in the gaps?

Matt


-- 
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-02 Thread Jay Pipes

On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:

Hello Nova team,

     During the Cyborg discussion at Rocky PTG, we proposed a flow for 
FPGAs wherein the request spec asks for a device type as a resource 
class, and optionally a function (such as encryption) in the extra 
specs. This does not seem to work well for the usage model that I’ll 
describe below.


An FPGA device may implement more than one function. For example, it may 
implement both compression and encryption. Say a cluster has 10 devices 
of device type X, and each of them is programmed to offer 2 instances of 
function A and 4 instances of function B. More specifically, the device 
may implement 6 PCI functions, with 2 of them tied to function A, and 
the other 4 tied to function B. So, we could have 6 separate instances 
accessing functions on the same device.


In the current flow, the device type X is modeled as a resource class, 
so Placement will count how many of them are in use. A flavor for ‘RC 
device-type-X + function A’ will consume one instance of the RC 
device-type-X.  But this is not right because this precludes other 
functions on the same device instance from getting used.


One way to solve this is to declare functions A and B as resource 
classes themselves and have the flavor request the function RC. 
Placement will then correctly count the function instances. However, 
there is still a problem: if the requested function A is not available, 
Placement will return an empty list of RPs, but we need some way to 
reprogram some device to create an instance of function A.


Clearly, nova is not going to be reprogramming devices with an instance 
of a particular function.


Cyborg might need to have a separate agent that listens to the nova 
notifications queue and upon seeing an event that indicates a failed 
build due to lack of resources, then Cyborg can try and reprogram a 
device and then try rebuilding the original request.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-02 Thread Nadathur, Sundar
Hello Nova team,
During the Cyborg discussion at Rocky PTG, we proposed a flow for FPGAs 
wherein the request spec asks for a device type as a resource class, and 
optionally a function (such as encryption) in the extra specs. This does not 
seem to work well for the usage model that I'll describe below.

An FPGA device may implement more than one function. For example, it may 
implement both compression and encryption. Say a cluster has 10 devices of 
device type X, and each of them is programmed to offer 2 instances of function 
A and 4 instances of function B. More specifically, the device may implement 6 
PCI functions, with 2 of them tied to function A, and the other 4 tied to 
function B. So, we could have 6 separate instances accessing functions on the 
same device.

In the current flow, the device type X is modeled as a resource class, so 
Placement will count how many of them are in use. A flavor for 'RC 
device-type-X + function A' will consume one instance of the RC device-type-X.  
But this is not right because this precludes other functions on the same device 
instance from getting used.

One way to solve this is to declare functions A and B as resource classes 
themselves and have the flavor request the function RC. Placement will then 
correctly count the function instances. However, there is still a problem: if 
the requested function A is not available, Placement will return an empty list 
of RPs, but we need some way to reprogram some device to create an instance of 
function A.

Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev