[openstack-dev] [cyborg] New time for Cyborg weekly IRC meetings

2018-11-26 Thread Nadathur, Sundar

Hi,
 The current time for the weekly Cyborg IRC meeting is 1400 UTC, 
which is 6 am Pacific and 10pm China time. That is a bad time for most 
people in the call.


Please vote in this doodle for what time you prefer.

If you need more options, please respond in this thread.

[1] https://doodle.com/poll/eqy3hp8hfqtf2qyn


Thanks & Regards,

Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Poll: Name for VARs

2018-10-26 Thread Nadathur, Sundar
Thanks for all who participated in the discussion and/or voted. The most 
votes, such as there were, went for the name 'Accelerator Requests' 
abbrev. ARQs. The specs will be updated over the next couple of days.


Have a good weekend.

Best Regards,
Sundar

On 10/22/2018 11:37 AM, Nadathur, Sundar wrote:

Hi,
The name VAR (Virtual Accelerator Request) is introduced in 
https://review.openstack.org/#/c/603955/. It came up during the Stein 
PTG and is being used by default, but some folks have said they find 
the name VAR to be confusing. I would like to resolve this to 
completion, so that whatever name we choose is not subject to 
recurrent debates in the future.


Here is a poll for Cyborg and Nova developers to indicate their 
preferences for existing or proposed options:
https://docs.google.com/spreadsheets/d/179Q8J9qIJNOiVm86K7bWPxo7otTsU18XVCI32V77JaU/edit?usp=sharing 



1. Please add your name, if not already listed, and please feel free 
to propose additional options as you see fit.

2. The voting is by rank -- 1 indicates most preferred.
3. If you strongly oppose a term, you may say 'No' and justify with a 
comment.

   (Comments are added by pressing Ctrl-Alt-M on a cell.)

I'll keep this open for a minimum of two days and possibly for a week 
depending on feedback.


Regards,
Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cyborg] [nova] Poll: Name for VARs

2018-10-22 Thread Nadathur, Sundar

Hi,
The name VAR (Virtual Accelerator Request) is introduced in 
https://review.openstack.org/#/c/603955/. It came up during the Stein 
PTG and is being used by default, but some folks have said they find the 
name VAR to be confusing. I would like to resolve this to completion, so 
that whatever name we choose is not subject to recurrent debates in the 
future.


Here is a poll for Cyborg and Nova developers to indicate their 
preferences for existing or proposed options:
https://docs.google.com/spreadsheets/d/179Q8J9qIJNOiVm86K7bWPxo7otTsU18XVCI32V77JaU/edit?usp=sharing 



1. Please add your name, if not already listed, and please feel free to 
propose additional options as you see fit.

2. The voting is by rank -- 1 indicates most preferred.
3. If you strongly oppose a term, you may say 'No' and justify with a 
comment.

   (Comments are added by pressing Ctrl-Alt-M on a cell.)

I'll keep this open for a minimum of two days and possibly for a week 
depending on feedback.


Regards,
Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Neutron] [Cyborg] Cyborg-Neutron interaction for programmable NICs

2018-09-04 Thread Nadathur, Sundar

Hello Neutron folks,
 There is emerging interest in programmable NICs that combine FPGAs 
and networking in different ways. I wrote up about one category of them 
here:


   https://etherpad.openstack.org/p/fpga-networking

This was discussed at the Neutron meeting on Sep 3 [1]. This approach to 
programmable networking raises many questions. I have summarized them in 
this etherpad and proposed a possible solution.


Please review this. We have a session in the PTG on Thursday (Sep 13) 
from 3:15 to 4:15 pm on this topic.


Given the level of interest that we are seeing, I hope we get some 
agreement early enough that we can do at least some POCs in Stein cycle.


[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2018-09-03.log.html#t2018-09-03T21:43:48 



Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] Zoom URL for Aug 29 meeting

2018-08-23 Thread Nadathur, Sundar
Please use this invite instead, because it does not have the time limits 
of the old one (updated in  Cyborg wiki as well).


Time: Aug 29, 2018 10:00 AM Eastern Time (US and Canada)

Join from PC, Mac, Linux, iOS or Android: *https://zoom.us/j/395326369*

Or iPhone one-tap :
    US: +16699006833,,395326369#  or +16465588665,,395326369#

Or Telephone:
    Dial(for higher quality, dial a number based on your current 
location):

        US: +1 669 900 6833  or +1 646 558 8665

    Meeting ID: 395 326 369

    International numbers available: https://zoom.us/u/eGbqK3pMh

Thanks,
Sundar


On 8/22/2018 11:39 PM, Nadathur, Sundar wrote:


For the August 29 weekly meeting [1], the main agenda is the 
discussion of Cyborg device/data models.


We will use this meeting invite to present slides:

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/189707867

Or iPhone one-tap :
    US: +16465588665,,189707867#  or +14086380986,,189707867#
Or Telephone:
    Dial(for higher quality, dial a number based on your current 
location):

    US: +1 646 558 8665  or +1 408 638 0986
    Meeting ID: 189 707 867
    International numbers available: https://zoom.us/u/dnYoZcYYJ

[1] https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting

Regards,
Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] Zoom URL for Aug 29 meeting

2018-08-23 Thread Nadathur, Sundar
For the August 29 weekly meeting [1], the main agenda is the discussion 
of Cyborg device/data models.


We will use this meeting invite to present slides:

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/189707867

Or iPhone one-tap :
    US: +16465588665,,189707867#  or +14086380986,,189707867#
Or Telephone:
    Dial(for higher quality, dial a number based on your current 
location):

    US: +1 646 558 8665  or +1 408 638 0986
    Meeting ID: 189 707 867
    International numbers available: https://zoom.us/u/dnYoZcYYJ


[1] https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] Update device info in db via REST API or RPC?

2018-08-12 Thread Nadathur, Sundar

Hi all,
  Apparently a decision was taken to have the Cyborg agent update the
Cyborg database with device information using REST APIs, as part of 
discovery.


The use of REST API has many implications:
* It is open to public. So, we have to authenticate the users and check for
  abuse. Even if it is open only to operators, it can still be prone to
  error.

* REST APIs have backwards compatibility requirements. It will not be 
easy to

  change the signature or semantics. We also need to check the implications
  on upgrade.

It would be better to make this an RPC API offered by the Cyborg conductor,
which will keep it internal to Cyborg and avoid the issues above.

Thanks.

Regards,
Sundar




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] Agent - Conductor update

2018-08-08 Thread Nadathur, Sundar

Hi Zhenghao,

On 8/8/2018 4:10 AM, Zhenghao ZH21 Wang wrote:

Hi Sundar,
All look good to me. And I agreed with the new solution as your suggestion. But 
I still confused why we will lost some device info if we do diff on agent?
Could u give me an example to explain how to lost and what we will lost?


To do the diff, the agent would need the previous configuration of 
devices on the host. If it keeps that previous config in its process 
memory, it will lose it if it dies and restarts for any reason. So, it 
should persist it. The ideal place to persist that is the Cyborg db. So, 
let us say the agent writes the config to the db each time via the 
conductor.


However, consider the scenario where the agent pushes an update to the 
conductor, and restarts before the conductor has written it to the db. 
This can result in a race condition. If we don't address that properly, 
the agent may get the copy in the db and not the latest update. That is 
the loss we were talking about.


To prevent that race, the restarted agent should ask the conductor to 
get the latest, and the conductor must be smart enough to 'synchronize' 
with the previous unfinished update. This seems like unnecessary 
complication.


I think this is what you are asking about. If not, please let me know 
what you meant.



Best regards
Zhenghao Wang
Cloud Researcher

Email: wangz...@lenovo.com
Tel: (+86) 18519550096
Enterprise & Cloud Research Lab, Lenovo Research
No.6 Shangdi West Road, Haidian District, Beijing

Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] Agent - Conductor update

2018-08-06 Thread Nadathur, Sundar

Hi,
   The Cyborg agent in a compute node collects information about 
devices from the Cyborg drivers on that node. It then needs to push that 
information to the Cyborg conductor in the controller, which then needs 
to persist it in the Cyborg db and update Placement. Further, the agent 
needs to collect and update this information periodically (or possibly 
in response to notifications) to handle hot add/delete of devices, 
reprogramming (for FPGAs), health failure of devices, etc.


In this morning's call, we discussed how to do this periodic update [1]. 
In particular, we talked about how to compute the difference between the 
previous device configuration in a compute node and the current one, 
whether the agent do should do that diff or the controller, etc. Since 
there are many fields per device, and they are tree-structured, the 
complexity of doing the diff seemed large.


On taking a closer look, however, the amount of computation needed to do 
the update is not huge. Say, for discussion's sake, that the controller 
has a snapshot of the entire device config for a specific compute node, 
i.e. an array of device structures NewConfig[]. It reads the current 
list of devices for that node from the db, CurrentConfig[]. Then the 
controller's logic is like this:


 * Determine the list of devices in NewConfig[] but not in
   CurrentConfig[] (this is a set difference in Python [2]): they are
   the newly added ones. For each newly added device, do a single
   transaction to add all the fields to the db together.
 * Determine the list of devices in CurrentConfig[] but not in
   NewConfig[]: they are the deleted devices.For each such device, do a
   single transaction to delete that entry.
 * For each modified device, compute what has changed, and update that
   alone. This is the per-field diff.

Say each field in the device structure is a string of 100 characters, 
and it takes 1 nanosecond to add, delete or modify a character. So, each 
field takes 100 ns to update (add/delete/modify). Say 20 fields per 
device: so 2 us to add, delete or modify a device. Say 10 devices per 
compute node: so 20 us per node. 500 nodes will take 10 milliseconds. 
So, if each node sends a refresh every second, the controller will spend 
a very small fraction of that time in updating the db, even including 
transaction costs, set difference computation, etc.


This back-of-the-envelope calculation shows that we need not try to 
optimize too early: the agent should send the entire device config over 
to the controller, and let it update the db per-device and per-field.


[1] https://etherpad.openstack.org/p/cyborg-rocky-development
[2] https://docs.python.org/2/library/sets.html

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] Updates to os-acc proposal

2018-08-01 Thread Nadathur, Sundar

Hi Eric,
    Please see my responses inline. On an unrelated note, thanks for 
the pointer to the GPU spec 
(https://review.openstack.org/#/c/579359/10/doc/source/specs/rocky/device-passthrough.rst). 
I will review that.


On 7/31/2018 10:42 AM, Eric Fried wrote:

Sundar-


   * Cyborg drivers deal with device-specific aspects, including
 discovery/enumeration of devices and handling the Device Half of the
 attach (preparing devices/accelerators for attach to an instance,
 post-attach cleanup (if any) after successful attach, releasing
 device/accelerator resources on instance termination or failed
 attach, etc.)
   * os-acc plugins deal with hypervisor/system/architecture-specific
 aspects, including handling the Instance Half of the attach (e.g.
 for libvirt with PCI, preparing the XML snippet to be included in
 the domain XML).

This sounds well and good, but discovery/enumeration will also be
hypervisor/system/architecture-specific. So...
Fair enough. We had discussed that too. The Cyborg drivers can also 
invoke REST APIs etc. for Power.

Thus, the drivers and plugins are expected to be complementary. For
example, for 2 devices of types T1 and T2, there shall be 2 separate
Cyborg drivers. Further, we would have separate plugins for, say,
x86+KVM systems and Power systems. We could then have four different
deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on Power, T2 on Power --
by suitable combinations of the drivers and plugins.

...the discovery/enumeration code for T1 on x86+KVM (lsdev? lspci?
walking the /dev file system?) will be totally different from the
discovery/enumeration code for T1 on Power
(pypowervm.wrappers.ManagedSystem.get(adapter)).

I don't mind saying "drivers do the device side; plugins do the instance
side" but I don't see getting around the fact that both "sides" will
need to have platform-specific code

Agreed. So, we could say:
- The plugins do the instance half. They are hypervisor-specific and 
platform-specific. (The term 'platform' subsumes both the architecture 
(Power, x86) and the server/system type.) They are invoked by os-acc.
- The drivers do the device half, device discovery/enumeration and 
anything not explicitly assigned to plugins. They contain 
device-specific and platform-specific code. They are invoked by Cyborg 
agent and os-acc.


Are you ok with the workflow in 
https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing 
?

One secondary detail to note is that Nova compute calls os-acc per
instance for all accelerators for that instance, not once for each
accelerator.

You mean for getVAN()?
Yes -- BTW, I renamed it as prepareVANs() or prepareVAN(), because it is 
not just a query as the name getVAN implies, but has side effects.

Because AFAIK, os_vif.plug(list_of_vif_objects,
InstanceInfo) is *not* how nova uses os-vif for plugging.


Yes, the os-acc will invoke the plug() once per VAN. IIUC, Nova calls 
Neutron once per instance for all networks, as seen in this code 
sequence in nova/nova/compute/manager.py:


_build_and_run_instance() --> _build_resources() -->

    _build_networks_for_instance() --> _allocate_network()

The _allocate_network() actually takes a list of requested_networks, and 
handles all networks for an instance [1].


Chasing this further down:

_allocate_network --> _allocate_network_async()

--> self.network_api.allocate_for_instance()

 == nova/network/rpcapi.py::allocate_for_instance()

So, even the RPC out of Nova seems to take a list of networks [2].

[1] 
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529
[2] 
https://github.com/openstack/nova/blob/master/nova/network/rpcapi.py#L163

Thanks,
Eric
//lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Cyborg] Updates to os-acc proposal

2018-07-30 Thread Nadathur, Sundar

Hi Eric and all,
    With recent discussions [1], we have convergence on how Power and 
other architectures can use Cyborg. Before I update the spec [2], I am 
setting down some key aspects of the updates, so that we are all aligned.


The accelerator - instance attachment has two parts:

 * The connection between the accelerator and a host-visible attach
   handle, such as a PCI function or a mediated device UUID. We call
   this the Device Half of the attach.
 * The connection between the attach handle and the instance. We name
   this the Instance Half of the attach.

I propose two different extensibility mechanisms:

 * Cyborg drivers deal with device-specific aspects, including
   discovery/enumeration of devices and handling the Device Half of the
   attach (preparing devices/accelerators for attach to an instance,
   post-attach cleanup (if any) after successful attach, releasing
   device/accelerator resources on instance termination or failed
   attach, etc.)
 * os-acc plugins deal with hypervisor/system/architecture-specific
   aspects, including handling the Instance Half of the attach (e.g.
   for libvirt with PCI, preparing the XML snippet to be included in
   the domain XML).

When invoked by Nova compute to attach accelerator(s) to an instance, 
os-acc would call the Cyborg driver to prepare a VAN (Virtual 
Accelerator Nexus, which is a handle object for attaching an accelerator 
to an instance, similar to VIFs for networking). Such preparation may 
involve configuring the device in some way, including programming for 
FPGAs. This sets up a VAN object with the necessary data for the attach 
(e.g. PCI VF, Power DRC index, etc.). Then the os-acc would call a 
plugin to do the needful for that hypervisor, using that VAN. Finally 
the os-acc may call the Cyborg driver again to do any post-attach 
cleanup, if needed.


A more detailed workflow is here: 
https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing 



Thus, the drivers and plugins are expected to be complementary. For 
example, for 2 devices of types T1 and T2, there shall be 2 separate 
Cyborg drivers. Further, we would have separate plugins for, say, 
x86+KVM systems and Power systems. We could then have four different 
deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on Power, T2 on Power -- 
by suitable combinations of the drivers and plugins.


It is possible that there may be scenarios where the separation of roles 
between the plugins and the drivers are not so clear-cut. That can be 
addressed by allowing the plugins to call into Cyborg drivers in the 
future and/or by other mechanisms.


One secondary detail to note is that Nova compute calls os-acc per 
instance for all accelerators for that instance, not once for each 
accelerator. There are two reasons for that:


 * I think this is how Nova deals with os-vif [3].
 * If some accelerators got allocated/configured, and the next
   accelerator configuration fails, a rollback needs to be done. This
   is better done in os-acc than Nova compute.

Cyborg drivers are invoked both by the Cyborg agent (for 
discovery/enumeration) and by os-acc (for instance attach). Both shall 
use Stevedore to locate and load the drivers. A single Python module may 
implement both sets of interfaces, like this:


+--+ +---+
| Nova Compute | |Cyborg |
++-+ |Agent  |
 |   +---+---+
+v---+   |
| os-acc |   |
++---+   |
 |   |
 | Cyborg driver |
+v+--v---+
|UN/PLUG ACCELERATORS |  DISCOVER|
|FROM INSTANCES   |  ACCELERATORS|
| |  |
|* can_handle()   |  * get_devices() |
|* prepareVAN()   |  |
|* postplug() |  |
|* unprepareVAN() |  |
+-+--+

If there are no objections to the above, I will update the spec [2].

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-07-30.log.html#t2018-07-30T16:25:41-2 


[2] https://review.openstack.org/#/c/577438/
[3] 
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529


Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] [Glance] Updated spec for Cyborg-Nova-Glance interaction, including os-acc

2018-06-22 Thread Nadathur, Sundar

s/review the new version is/review the new version/

Regards,
Sundar
On 6/22/2018 8:06 AM, Nadathur, Sundar wrote:

Hello folks,
The os-acc spec [1] has been updated substantially. Please review the 
new version is https://review.openstack.org/#/c/577438/ .


The background for the update is that several important aspects were 
raised as comments on the previous spec ([1], [2]). An alternative 
workflow for attaching accelerators to instances was proposed [3], to 
which I responded with [4] and [5]. Finally, with another IRC 
discussion [6], it was concluded that the design/flow in [4], [5] fits 
the bill. The new version of the os-acc spec incorporates that discussion.


The main points that were raised and addressed are these:

* Some architectures like Power treat devices differently. The os-acc 
framework must provide for plugins to handle such variation. Done.


* The os-acc framework should be more closely patterned after the 
os-vif framework and Neutron flow. This is a bit debatable since 
Neutron ports and Cyborg accelerators differ in some key respects, 
though the os-acc library can be structured like os-vif. I have 
attempted to compare and contrast the os-vif and os-acc approaches.


This discussion is important because we may have programmable NICs 
based on FPGAs. Then Cyborg, Neutron and Nova are going to get tangled 
in a triangle. (If you throw Glance in for FPGA images, that leads 
quickly to a quadrilateral. Add Cinder for storage-related FPGA 
devices, and we get pulled into a pentagram. Geometry is scary. Just 
saying. ;-} )


* Not enough detail in [1]. Mea culpa. Hopefully fixed now.

[1] https://review.openstack.org/#/c/566798/

[2] 
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-06-14.log.html#t2018-06-14T18:38:28 



[3] 
https://review.openstack.org/#/c/575545/1/specs/rocky/approved/nova-cyborg-flow.rst 



[4] https://etherpad.openstack.org/p/os-acc-discussion

[5] 
https://docs.google.com/drawings/d/1gbfimiyA1f5sTeobN9mpavEkHT7Z_ScNUqimOkdIYGA/edit 



[6] 
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-06-18.log.html#t2018-06-18T22:07:02 



Regards,
Sundar



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Cyborg] [Glance] Updated spec for Cyborg-Nova-Glance interaction, including os-acc

2018-06-22 Thread Nadathur, Sundar

Hello folks,
The os-acc spec [1] has been updated substantially. Please review the 
new version is https://review.openstack.org/#/c/577438/ .


The background for the update is that several important aspects were 
raised as comments on the previous spec ([1], [2]). An alternative 
workflow for attaching accelerators to instances was proposed [3], to 
which I responded with [4] and [5]. Finally, with another IRC discussion 
[6], it was concluded that the design/flow in [4], [5] fits the bill. 
The new version of the os-acc spec incorporates that discussion.


The main points that were raised and addressed are these:

* Some architectures like Power treat devices differently. The os-acc 
framework must provide for plugins to handle such variation. Done.


* The os-acc framework should be more closely patterned after the os-vif 
framework and Neutron flow. This is a bit debatable since Neutron ports 
and Cyborg accelerators differ in some key respects, though the os-acc 
library can be structured like os-vif. I have attempted to compare and 
contrast the os-vif and os-acc approaches.


This discussion is important because we may have programmable NICs based 
on FPGAs. Then Cyborg, Neutron and Nova are going to get tangled in a 
triangle. (If you throw Glance in for FPGA images, that leads quickly to 
a quadrilateral. Add Cinder for storage-related FPGA devices, and we get 
pulled into a pentagram. Geometry is scary. Just saying. ;-} )


* Not enough detail in [1]. Mea culpa. Hopefully fixed now.

[1] https://review.openstack.org/#/c/566798/

[2] 
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-06-14.log.html#t2018-06-14T18:38:28 



[3] 
https://review.openstack.org/#/c/575545/1/specs/rocky/approved/nova-cyborg-flow.rst 



[4] https://etherpad.openstack.org/p/os-acc-discussion

[5] 
https://docs.google.com/drawings/d/1gbfimiyA1f5sTeobN9mpavEkHT7Z_ScNUqimOkdIYGA/edit 



[6] 
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-06-18.log.html#t2018-06-18T22:07:02 



Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Cyborg] [Nova] Backup plan without nested RPs

2018-06-04 Thread Nadathur, Sundar

Hi,
 Cyborg needs to create RCs and traits for accelerators. The 
original plan was to do that with nested RPs. To avoid rushing the Nova 
developers, I had proposed that Cyborg could start by applying the 
traits to the compute node RP, and accept the resulting caveats for 
Rocky, till we get nested RP support. That proposal did not find many 
takers, and Cyborg has essentially been in waiting mode.


Since it is June already, and there is a risk of not delivering anything 
meaningful in Rocky, I am reviving my older proposal, which is 
summarized as below:


 * Cyborg shall create the RCs and traits as per spec
   (https://review.openstack.org/#/c/554717/), both in Rocky and
   beyond. Only the RPs will change post-Rocky.
 * In Rocky:
 o Cyborg will not create nested RPs. It shall apply the device
   traits to the compute node RP.
 o Cyborg will document the resulting caveat, i.e., all devices in
   the same host should have the same traits. In particular, we
   cannot have a GPU and a FPGA, or 2 FPGAs of different types, in
   the same host.
 o Cyborg will document that upgrades to post-Rocky releases will
   require operator intervention (as described below).
 *   For upgrade to post-Rocky world with nested RPs:
 o The operator needs to stop all running instances that use an
   accelerator.
 o The operator needs to run a script that removes the Cyborg
   traits and the inventory for Cyborg RCs from compute node RPs.
 o The operator can then perform the upgrade. The new Cyborg
   agent/driver(s) shall created nested RPs and publish
   inventory/traits as specified.

IMHO, it is acceptable for Cyborg to do this because it is new and we 
can set expectations for the (lack of) upgrade plan. The alternative is 
that potentially no meaningful use cases get addressed in Rocky for Cyborg.


Please LMK what you think.

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-31 Thread Nadathur, Sundar

On 5/30/2018 1:18 PM, Eric Fried wrote:

This all sounds fully reasonable to me.  One thing, though...


   * There is a resource class per device category e.g.
 CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.

Let's propose standard resource classes for these ASAP.

https://github.com/openstack/nova/blob/d741f624c81baf89fc8b6b94a2bc20eb5355a818/nova/rc_fields.py

-efried
Makes sense, Eric. The obvious names would be ACCELERATOR_GPU and 
ACCELERATOR_FPGA. Do we just submit a patch to rc_fields.py?


Thanks,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-30 Thread Nadathur, Sundar

Hi Sylvain,
  Glad to know we are on the same page. I haven't updated the spec with 
this proposal yet, in case I got more comments :). I will do so by today.


Thanks,
Sundar

On 5/30/2018 12:34 AM, Sylvain Bauza wrote:



On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar 
mailto:sundar.nadat...@intel.com>> wrote:


Hi all,
   The Cyborg/Nova scheduling spec [1] details what traits will be
applied to the resource providers that represent devices like
GPUs. Some of the traits referred to vendor names. I got feedback
that traits must not refer to products or specific models of
devices. I agree. However, we need some reference to device types
to enable matching the VM driver with the device.

TL;DR We need some reference to device types, but we don't need
product names. I will update the spec [1] to clarify that. Rest of
this email clarifies why we need device types in traits, and what
traits we propose to include.

In general, an accelerator device is operated by two pieces of
software: a driver in the kernel (which may discover and handle
the PF for SR-IOV  devices), and a driver/library in the guest
(which may handle the assigned VF).

The device assigned to the VM must match the driver/library
packaged in the VM. For this, the request must explicitly state
what category of devices it needs. For example, if the VM needs a
GPU, it needs to say whether it needs an AMD GPU or an Nvidia GPU,
since it may have the driver/libraries for that vendor alone. It
may also need to state what version of Cuda is needed, if it is a
Nvidia GPU. These aspects are necessarily vendor-specific.


FWIW, the vGPU implementation for Nova also has the same concern. We 
want to provide traits for explicitly say "use this vGPU type" but 
given it's related to a specific vendor, we can't just say "ask for 
this frame buffer size, or just for the display heads", but rather "we 
need a vGPU accepting Quadro vDWS license".


Further, one driver/library version may handle multiple devices.
Since a new driver version may be backwards compatible, multiple
driver versions may manage the same device. The
development/release of the driver/library inside the VM should be
independent of the kernel driver for that device.


I agree.

For FPGAs, there is an additional twist as the VM may need
specific bitstream(s), and they match only specific device/region
types. The bitstream for a device from a vendor will not fit any
other device from the same vendor, let alone other vendors. IOW,
the region type is specific not just to a vendor but to a device
type within the vendor. So, it is essential to identify the device
type.

So, the proposed set of RCs and traits are as below. As we learn
more about actual usages by operators, we may need to evolve this set.

  * There is a resource class per device category e.g.
CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
  * The resource provider that represents a device has the
following traits:
  o Vendor/Category trait: e.g. CUSTOM_GPU_AMD,
CUSTOM_FPGA_XILINX.
  o Device type trait which is a refinement of vendor/category
trait e.g. CUSTOM_FPGA_XILINX_VU9P.

NOTE: This is not a product or model, at least for FPGAs.
Multiple products may use the same FPGA chip.
NOTE: The reason for having both the vendor/category and
this one is that a flavor may ask for either, depending on
the granularity desired. IOW, if one driver can handle all
devices from a vendor (*eye roll*), the flavor can ask for
the vendor/category trait alone. If there are separate
drivers for different device families from the same
vendor, the flavor must specify the trait for the device
family.
NOTE: The equivalent trait for GPUs may be like
CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that
is a product or not.


I was about to propose the same for vGPUs in Nova, ie. using custom 
traits. The only concern is that we need operators to set the traits 
directly using osc-placement instead of having Nova magically provide 
those traits. But anyway, given operators need to set the vGPU types 
they want, I think it's acceptable.



  o For FPGAs, we have additional traits:
  + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
  + Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_.
  + Optionally, a function ID, indicating what function is
currently programmed in the region RP. e.g.
CUSTOM_FPGA_INTEL_FUNCTION_. Not all
implementations may provide it. The function trait may
change on r

[openstack-dev] [Cyborg] [Nova] Cyborg traits

2018-05-29 Thread Nadathur, Sundar

Hi all,
   The Cyborg/Nova scheduling spec [1] details what traits will be 
applied to the resource providers that represent devices like GPUs. Some 
of the traits referred to vendor names. I got feedback that traits must 
not refer to products or specific models of devices. I agree. However, 
we need some reference to device types to enable matching the VM driver 
with the device.


TL;DR We need some reference to device types, but we don't need product 
names. I will update the spec [1] to clarify that. Rest of this email 
clarifies why we need device types in traits, and what traits we propose 
to include.


In general, an accelerator device is operated by two pieces of software: 
a driver in the kernel (which may discover and handle the PF for SR-IOV  
devices), and a driver/library in the guest (which may handle the 
assigned VF).


The device assigned to the VM must match the driver/library packaged in 
the VM. For this, the request must explicitly state what category of 
devices it needs. For example, if the VM needs a GPU, it needs to say 
whether it needs an AMD GPU or an Nvidia GPU, since it may have the 
driver/libraries for that vendor alone. It may also need to state what 
version of Cuda is needed, if it is a Nvidia GPU. These aspects are 
necessarily vendor-specific.


Further, one driver/library version may handle multiple devices. Since a 
new driver version may be backwards compatible, multiple driver versions 
may manage the same device. The development/release of the 
driver/library inside the VM should be independent of the kernel driver 
for that device.


For FPGAs, there is an additional twist as the VM may need specific 
bitstream(s), and they match only specific device/region types. The 
bitstream for a device from a vendor will not fit any other device from 
the same vendor, let alone other vendors. IOW, the region type is 
specific not just to a vendor but to a device type within the vendor. 
So, it is essential to identify the device type.


So, the proposed set of RCs and traits are as below. As we learn more 
about actual usages by operators, we may need to evolve this set.


 * There is a resource class per device category e.g.
   CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
 * The resource provider that represents a device has the following traits:
 o Vendor/Category trait: e.g. CUSTOM_GPU_AMD, CUSTOM_FPGA_XILINX.
 o Device type trait which is a refinement of vendor/category trait
   e.g. CUSTOM_FPGA_XILINX_VU9P.

   NOTE: This is not a product or model, at least for FPGAs.
   Multiple products may use the same FPGA chip.
   NOTE: The reason for having both the vendor/category and this
   one is that a flavor may ask for either, depending on the
   granularity desired. IOW, if one driver can handle all devices
   from a vendor (*eye roll*), the flavor can ask for the
   vendor/category trait alone. If there are separate drivers for
   different device families from the same vendor, the flavor must
   specify the trait for the device family.
   NOTE: The equivalent trait for GPUs may be like
   CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that is a
   product or not.

 o For FPGAs, we have additional traits:
 + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
   CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
 + Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_.
 + Optionally, a function ID, indicating what function is
   currently programmed in the region RP. e.g.
   CUSTOM_FPGA_INTEL_FUNCTION_. Not all implementations
   may provide it. The function trait may change on
   reprogramming, but it is not expected to be frequent.
 + Possibly, CUSTOM_PROGRAMMABLE as a separate trait.

[1] https://review.openstack.org/#/c/554717/

Thanks.

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-18 Thread Nadathur, Sundar

On 5/18/2018 5:06 AM, Sylvain Bauza wrote:



Le ven. 18 mai 2018 à 13:59, Nadathur, Sundar 
<sundar.nadat...@intel.com <mailto:sundar.nadat...@intel.com>> a écrit :


Hi Matt,

On 5/17/2018 3:18 PM, Matt Riedemann wrote:

On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:

This applies only to the resources that Nova handles, IIUC,
which does not handle accelerators. The generic method that Alex
talks about is obviously preferable but, if that is not
available in Rocky, is the filter an option?


If nova isn't creating accelerator resources managed by cyborg, I
have no idea why nova would be doing quota checks on those types
of resources. And no, I don't think adding a scheduler filter to
nova for checking accelerator quota is something we'd add either.
I'm not sure that would even make sense - the quota for the
resource is per tenant, not per host is it? The scheduler filters
work on a per-host basis.

Can we not extend BaseFilter.filter_all() to get all the hosts in
a filter?
https://github.com/openstack/nova/blob/master/nova/filters.py#L36

I should have made it clearer that this putative filter will be
out-of-tree, and needed only till better solutions become available.


No, there are two clear parameters for a filter, and changing that 
would mean a new paradigm for FilterScheduler.
If you need to have a check for all the hosts, maybe it should be 
either a pre-filter for Placement or a post-filter but we don't accept 
out of tree yet.


Thanks, Sylvain. So, the filter approach got filtered out.

Matt had mentioned that Cinder volume quotas are not checked by Nova 
either, citing:

 https://bugs.launchpad.net/nova/+bug/1742102
That includes this comment:
    https://bugs.launchpad.net/nova/+bug/1742102/comments/4
I'll check how Cinder does it today.

Thanks to all for your valuable input.

Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-18 Thread Nadathur, Sundar

Hi Matt,

On 5/17/2018 3:18 PM, Matt Riedemann wrote:

On 5/17/2018 3:36 PM, Nadathur, Sundar wrote:
This applies only to the resources that Nova handles, IIUC, which 
does not handle accelerators. The generic method that Alex talks 
about is obviously preferable but, if that is not available in Rocky, 
is the filter an option?


If nova isn't creating accelerator resources managed by cyborg, I have 
no idea why nova would be doing quota checks on those types of 
resources. And no, I don't think adding a scheduler filter to nova for 
checking accelerator quota is something we'd add either. I'm not sure 
that would even make sense - the quota for the resource is per tenant, 
not per host is it? The scheduler filters work on a per-host basis.

Can we not extend BaseFilter.filter_all() to get all the hosts in a filter?
https://github.com/openstack/nova/blob/master/nova/filters.py#L36

I should have made it clearer that this putative filter will be 
out-of-tree, and needed only till better solutions become available.


Like any other resource in openstack, the project that manages that 
resource should be in charge of enforcing quota limits for it.
Agreed. Not sure how other projects handle it, but here's the situation 
for Cyborg. A request may get scheduled on a compute node with no 
intervention by Cyborg. So, the earliest check that can be made today is 
in the selected compute node. A simple approach can result in quota 
violations as in this example.


   Say there are 5 devices in a cluster. A tenant has a quota of 4 and
   is currently using 3. That leaves 2 unused devices, of which the
   tenant is permitted to use only one. But he may submit two
   concurrent requests, and they may land on two different compute
   nodes. The Cyborg agent in each node will see the current tenant
   usage as 3 and let the request go through, resulting in quota violation.

To prevent this, we need some kind of atomic update , like SQLAlchemy's 
with_lockmode():
https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE 

That seems to have issues, as documented in the link above. Also, since 
every compute node does that, it would also serialize the bringup of all 
instances with accelerators, across the cluster.


If there is a better solution, I'll be happy to hear it.

Thanks,
Sundar




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-17 Thread Nadathur, Sundar

Hi all,
    Thanks for all the feedback. Please see below.

2018-05-17 1:24 GMT+08:00 Jay Pipes >:


   Placement already stores usage information for all allocations of
   resources. There is already even a /usages API endpoint that you can
   specify a project and/or user:

   https://developer.openstack.org/api-ref/placement/#list-usages
   

   I see no reason not to use it.

 This does not seem to be per-project (per-tenant). Given a tenant ID 
and a resource class, we want to get usages of that RC by that tenant. 
Please LMK if I misunderstood something.


As Matt mentioned, Nova does not handle accelerators and presumably 
would not handle quotas for them either.


On 5/16/2018 11:34 PM, Alex Xu wrote:

   2018-05-17 1:24 GMT+08:00 Jay Pipes >:

   []

   There is already actually a spec to use placement for quota
   usage checks in Nova here:

   https://review.openstack.org/#/c/509042/
   


   FYI, I'm working on a spec which append to that spec. It's about
   counting quota for the resource class(GPU, custom RC, etc) other
   than nova built-in resources(cores, ram). It should be able to count
   the resource classes which are used by cyborg. But yes, we probably
   should answer Matt's question first, whether we should let Nova
   count quota instead of Cyborg.


here is the line https://review.openstack.org/#/c/569011/


Alex, is this expected to be implemented by Rocky?



Probably best to have a look at that and see if it will end up
meeting your needs.

  * Cyborg provides a filter for the Nova scheduler, which
checks
    whether the project making the request has exceeded
its own quota.


Quota checks happen before Nova's scheduler gets involved, so
having a scheduler filter handle quota usage checking is
pretty much a non-starter.

This applies only to the resources that Nova handles, IIUC, which does 
not handle accelerators. The generic method that Alex talks about is 
obviously preferable but, if that is not available in Rocky, is the 
filter an option?



I'll have a look at the patches you've proposed and comment there.


Thanks!



Best,
-jay



Regards,
Sundar
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cyborg] [nova] Cyborg quotas

2018-05-16 Thread Nadathur, Sundar

Hi,
   The Cyborg quota spec [1] proposes to implement a quota (maximum 
usage) for accelerators on a per-project basis, to prevent one project 
(tenant) from over-using some resources and starving other tenants. 
There are separate resource classes for different accelerator types 
(GPUs, FPGAs, etc.), and so we can do quotas per RC.


The current proposal [2] is to track the usage in Cyborg agent/driver. I 
am not sure that scheme will work, as I have indicated in the comments 
on [1]. Here is another possible way.


 * The operator configures the oslo.limit in keystone per-project
   per-resource-class (GPU, FPGA, ...).
 o Until this gets into Keystone, Cyborg may define its own quota
   table, as defined in [1].
 * Cyborg implements a table to track per-project usage, as defined in [1].
 * Cyborg provides a filter for the Nova scheduler, which checks
   whether the project making the request has exceeded its own quota.
 o If so, it removes all candidates, thus failing the request.
 o If not, it updates the per-project usage in its own DB. Since
   this is an out-of-tree filter, at least to start with, it should
   be ok to directly update the db without making REST API calls.

IOW, the resource usage tracking and enforcement are done as part of the 
request scheduling, rather than done at the compute node.


If there are better ways, or ways to avoid a filter, please LMK.

[1] https://review.openstack.org/#/c/560285/
[2] https://review.openstack.org/#/c/564968/

Thanks.

Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cyborg] [glance] [nova] Cyborg/Nova spec for os-acc is out

2018-05-08 Thread Nadathur, Sundar

Hi all,
    The Cyborg compute node specification has been published: 
https://review.openstack.org/#/c/566798/ . Please review it.


The main factors defined in this spec are:
* The behavior with respect to accelerators when various Compute API [1] 
operations are applied. E.g. On a reboot/pause/suspend, the assigned 
accelerators are left intact. But, on a stop or shelve, they are detached.
* The APIs for the newly proposed os-acc library. This is structured 
along the same lines as os-vif usage [2]. Changes are needed in Nova 
compute to invoke os-acc APIs on specific instance-related events.
* Interactions of Cyborg with Glance in the compute node. The plan is to 
use Glance properties. No changes are needed in Glance.


References:
[1] https://developer.openstack.org/api-guide/compute/server_concepts.html
[2] https://docs.openstack.org/os-vif/queens/user/usage.html

Thanks & Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg] Promote Li Liu as new core reviewer

2018-04-09 Thread Nadathur, Sundar

Agreed! +1

Regards,
Sundar


Hi Team,

This is an email for my nomination of adding Li Liu to the core 
reviewer team. Li Liu has been instrumental in the resource provider 
data model implementation for Cyborg during Queens release, as well as 
metadata standardization and programming design for Rocky.


His overall stats [0] and current stats [1] for Rocky speaks for 
itself. His patches could be found here [2].


Given the amount of work undergoing for Rocky, it would be great to 
add such an amazing force :)


[0] 
http://stackalytics.com/?module=cyborg-group=person-day=all
[1] 
http://stackalytics.com/?module=cyborg-group=person-day=rocky

[2] https://review.openstack.org/#/q/owner:liliueecg%2540gmail.com

--
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com 
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu 
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cyborg] Cyborg/Nova scheduling spec

2018-04-03 Thread Nadathur, Sundar
Thanks to everybody who has commented on the Cyborg/Nova scheduling spec 
(https://review.openstack.org/#/c/554717/).


As you may have noted, some issues were raised (*1), discussed (*2) and 
a potential solution was offered (*3). I have tried to synthesize the 
new solution from Nova team here:

 https://etherpad.openstack.org/p/Cyborg-Nova-Multifunction

This simplifies Cyborg design/implementation, by having the weigher use 
Placement info (no queries or extra info in Cyborg DB), and by opening 
the possibility of removing the weigher altogether if/when Nova supports 
preferred traits.


Please review it. Once that is done. I'll post an update that includes 
the new scheme and addresses any applicable comment in the current spec.


Thank you very much!

(*1) 
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128685.html
(*2) 
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128840.html, 
128889.html, etc.
(*3) 
http://lists.openstack.org/pipermail/openstack-dev/2018-March/12.html


Regards,
Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-31 Thread Nadathur, Sundar
e idea of tweaking traits on the fly.  For example, your
 vGPU case might be modeled as:

 PGPU_RP: {
   inventory: {
       CUSTOM_VGPU_TYPE_A: 2,
       CUSTOM_VGPU_TYPE_B: 4,
   }
   traits: [
       CUSTOM_VGPU_TYPE_A_CAPABLE,
       CUSTOM_VGPU_TYPE_B_CAPABLE,
   ]
 }

         The request would come in for
 resources=CUSTOM_VGPU_TYPE_A:1=VGPU_TYPE_A_CAPABLE, resulting
 in an allocation of CUSTOM_VGPU_TYPE_A:1.  Now while you're processing
 that, you would *remove* CUSTOM_VGPU_TYPE_B_CAPABLE from the PGPU_RP.
 So it doesn't matter that there's still inventory of
 CUSTOM_VGPU_TYPE_B:4, because a request including
 required=CUSTOM_VGPU_TYPE_B_CAPABLE won't be satisfied by this RP.
 There's of course a window between when the initial allocation is made
 and when you tweak the trait list.  In that case you'll just have to
 fail the loser.  This would be like any other failure in e.g. the spawn
 process; it would bubble up, the allocation would be removed; retries
 might happen or whatever.

         Like I said, you're likely to get a lot of resistance to
 this idea as
 well.  (Though TBH, I'm not sure how we can stop you beyond -1'ing your
 patches; there's nothing about placement that disallows it.)

         The simple-but-inefficient solution is simply that we'd
 still be able
 to make allocations for vGPU type B, but you would have to fail right
 away when it came down to cyborg to attach the resource.  Which is code
 you pretty much have to write anyway.  It's an improvement if cyborg
 gets to be involved in the post-get-allocation-candidates
 weighing/filtering step, because you can do that check at that point to
 help filter out the candidates that would fail.  Of course there's still
 a race condition there, but it's no different than for any other
 resource.

 efried

 On 03/28/2018 12:27 PM, Nadathur, Sundar wrote:
 > Hi Eric and all,
 >     I should have clarified that this race condition happens only for
 > the case of devices with multiple functions. There is a prior thread
 >
 <http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html
 
<http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html>>
 > about it. I was trying to get a solution within Cyborg, but that faces
 > this race condition as well.
 >
 > IIUC, this situation is somewhat similar to the issue with vGPU types
 >
 
<http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00
 
<http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00>>
 > (thanks to Alex Xu for pointing this out). In the latter case, we
 could
 > start with an inventory of (vgpu-type-a: 2; vgpu-type-b: 4).  But,
 after
 > consuming a unit of  vGPU-type-a, ideally the inventory should change
 > to: (vgpu-type-a: 1; vgpu-type-b: 0). With multi-function
 accelerators,
 > we start with an RP inventory of (region-type-A: 1, function-X:
 4). But,
 > after consuming a unit of that function, ideally the inventory should
 > change to: (region-type-A: 0, function-X: 3).
 >
 > I understand that this approach is controversial :) Also, one
 difference
 > from the vGPU case is that the number and count of vGPU types is
 static,
 > whereas with FPGAs, one could reprogram it to result in more or fewer
 > functions. That said, we could hopefully keep this analogy in mind for
 > future discussions.
 >
 > We probably will not support multi-function accelerators in Rocky.
 This
 > discussion is for the longer term.
 >
 > Regards,
 > Sundar
 >
 > On 3/23/2018 12:44 PM, Eric Fried wrote:
 >> Sundar-
 >>
 >>      First thought is to simplify by NOT keeping inventory
 information in
 >> the cyborg db at all.  The provider record in the placement service
 >> already knows the device (the provider ID, which you can look up
 in the
 >> cyborg db) the host (the root_provider_uuid of the provider
 representing
 >> the device) and the inventory, and (I hope) you'll be augmenting
 it with
 >> traits indicating what functions it's capable of.  That way, you'll
 >> always get allocation candidates with devices that *can* load the
 >> desired function; now you just have to engage your weigher to
 prioritize
 >> the ones that already have it loaded so you can prefer those.
 >>
 >>      Am I missing something?
 >>
 >>              efried
 &g

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-28 Thread Nadathur, Sundar
Thanks, Eric. Looks like there are no good solutions even as candidates, 
but only options with varying levels of unacceptability. It is funny 
that that the option that is considered the least unacceptable is to let 
the problem happen and then fail the request (last one in your list).


Could I ask what is the objection to the scheme that applies multiple 
traits and removes one as needed, apart from the fact that it has races?


Regards,
Sundar

On 3/28/2018 11:48 AM, Eric Fried wrote:

Sundar-

We're running across this issue in several places right now.   One
thing that's definitely not going to get traction is
automatically/implicitly tweaking inventory in one resource class when
an allocation is made on a different resource class (whether in the same
or different RPs).

Slightly less of a nonstarter, but still likely to get significant
push-back, is the idea of tweaking traits on the fly.  For example, your
vGPU case might be modeled as:

PGPU_RP: {
   inventory: {
   CUSTOM_VGPU_TYPE_A: 2,
   CUSTOM_VGPU_TYPE_B: 4,
   }
   traits: [
   CUSTOM_VGPU_TYPE_A_CAPABLE,
   CUSTOM_VGPU_TYPE_B_CAPABLE,
   ]
}

The request would come in for
resources=CUSTOM_VGPU_TYPE_A:1=VGPU_TYPE_A_CAPABLE, resulting
in an allocation of CUSTOM_VGPU_TYPE_A:1.  Now while you're processing
that, you would *remove* CUSTOM_VGPU_TYPE_B_CAPABLE from the PGPU_RP.
So it doesn't matter that there's still inventory of
CUSTOM_VGPU_TYPE_B:4, because a request including
required=CUSTOM_VGPU_TYPE_B_CAPABLE won't be satisfied by this RP.
There's of course a window between when the initial allocation is made
and when you tweak the trait list.  In that case you'll just have to
fail the loser.  This would be like any other failure in e.g. the spawn
process; it would bubble up, the allocation would be removed; retries
might happen or whatever.

Like I said, you're likely to get a lot of resistance to this idea as
well.  (Though TBH, I'm not sure how we can stop you beyond -1'ing your
patches; there's nothing about placement that disallows it.)

The simple-but-inefficient solution is simply that we'd still be able
to make allocations for vGPU type B, but you would have to fail right
away when it came down to cyborg to attach the resource.  Which is code
you pretty much have to write anyway.  It's an improvement if cyborg
gets to be involved in the post-get-allocation-candidates
weighing/filtering step, because you can do that check at that point to
help filter out the candidates that would fail.  Of course there's still
a race condition there, but it's no different than for any other resource.

efried

On 03/28/2018 12:27 PM, Nadathur, Sundar wrote:

Hi Eric and all,
     I should have clarified that this race condition happens only for
the case of devices with multiple functions. There is a prior thread
<http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html>
about it. I was trying to get a solution within Cyborg, but that faces
this race condition as well.

IIUC, this situation is somewhat similar to the issue with vGPU types
<http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00>
(thanks to Alex Xu for pointing this out). In the latter case, we could
start with an inventory of (vgpu-type-a: 2; vgpu-type-b: 4).  But, after
consuming a unit of  vGPU-type-a, ideally the inventory should change
to: (vgpu-type-a: 1; vgpu-type-b: 0). With multi-function accelerators,
we start with an RP inventory of (region-type-A: 1, function-X: 4). But,
after consuming a unit of that function, ideally the inventory should
change to: (region-type-A: 0, function-X: 3).

I understand that this approach is controversial :) Also, one difference
from the vGPU case is that the number and count of vGPU types is static,
whereas with FPGAs, one could reprogram it to result in more or fewer
functions. That said, we could hopefully keep this analogy in mind for
future discussions.

We probably will not support multi-function accelerators in Rocky. This
discussion is for the longer term.

Regards,
Sundar

On 3/23/2018 12:44 PM, Eric Fried wrote:

Sundar-

First thought is to simplify by NOT keeping inventory information in
the cyborg db at all.  The provider record in the placement service
already knows the device (the provider ID, which you can look up in the
cyborg db) the host (the root_provider_uuid of the provider representing
the device) and the inventory, and (I hope) you'll be augmenting it with
traits indicating what functions it's capable of.  That way, you'll
always get allocation candidates with devices that *can* load the
desired function; now you just have to engage your weigher to prioritize
the ones that already have it loaded so you can prefer those.

Am I missing something?

efried

On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:

Hi all,
     There seems to be a possibili

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-28 Thread Nadathur, Sundar

Hi Shaohe,
  I have responded in the Etherpad. The Cyborg/Nova scheduling spec 
details the 4 types of user requests 
<https://git.openstack.org/cgit/openstack/cyborg/tree/doc/specs/rocky/cyborg-nova-sched.rst?h=refs/changes/17/554717/1#n136>. 



I believe you are looking for more details on what the RC names, traits 
and flavors will look like. I will add that to the spec itself.


Thanks,
Sundar

On 3/28/2018 2:10 AM, 少合冯 wrote:

I have summarize some scenarios for fpga devices request.
https://etherpad.openstack.org/p/cyborg-fpga-request-scenarios

Please add more  more scenarios to find out the exceptions that 
placement can not satisfy the filter and weight.


IMOH, I refer placementto do filter and weight. If we have to let 
cyborg do filter and weight.  Nova scheduler just need call cyborg 
once for all host weight though we do the weigh one by one.



2018-03-23 12:27 GMT+08:00 Nadathur, Sundar <sundar.nadat...@intel.com 
<mailto:sundar.nadat...@intel.com>>:


Hi all,
    There seems to be a possibility of a race condition in the
Cyborg/Nova flow. Apologies for missing this earlier. (You can
refer to the proposed Cyborg/Nova spec

<https://review.openstack.org/#/c/554717/1/doc/specs/rocky/cyborg-nova-sched.rst>
for details.)

Consider the scenario where the flavor specifies a resource class
for a device type, and also specifies a function (e.g. encrypt) in
the extra specs. The Nova scheduler would only track the device
type as a resource, and Cyborg needs to track the availability of
functions. Further, to keep it simple, say all the functions exist
all the time (no reprogramming involved).

To recap, here is the scheduler flow for this case:

  * A request spec with a flavor comes to Nova
conductor/scheduler. The flavor has a device type as a
resource class, and a function in the extra specs.
  * Placement API returns the list of RPs (compute nodes) which
contain the requested device types (but not necessarily the
function).
  * Cyborg will provide a custom filter which queries Cyborg DB.
This needs to check which hosts contain the needed function,
and filter out the rest.
  * The scheduler selects one node from the filtered list, and the
request goes to the compute node.

For the filter to work, the Cyborg DB needs to maintain a table
with triples of (host, function type, #free units). The filter
checks if a given host has one or more free units of the requested
function type. But, to keep the # free units up to date, Cyborg on
the selected compute node needs to notify the Cyborg API to
decrement the #free units when an instance is spawned, and to
increment them when resources are released.

Therein lies the catch: this loop from the compute node to
controller is susceptible to race conditions. For example, if two
simultaneous requests each ask for function A, and there is only
one unit of that available, the Cyborg filter will approve both,
both may land on the same host, and one will fail. This is because
Cyborg on the controller does not decrement resource usage due to
one request before processing the next request.

This is similar to this previous Nova scheduling issue

<https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/placement-claims.html>.
That was solved by having the scheduler claim a resource in
Placement for the selected node. I don't see an analog for Cyborg,
since it would not know which node is selected.

Thanks in advance for suggestions and solutions.

Regards,
Sundar







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-28 Thread Nadathur, Sundar

Hi Eric and all,
    I should have clarified that this race condition happens only for 
the case of devices with multiple functions. There is a prior thread 
<http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html> 
about it. I was trying to get a solution within Cyborg, but that faces 
this race condition as well.


IIUC, this situation is somewhat similar to the issue with vGPU types 
<http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00> 
(thanks to Alex Xu for pointing this out). In the latter case, we could 
start with an inventory of (vgpu-type-a: 2; vgpu-type-b: 4).  But, after 
consuming a unit of vGPU-type-a, ideally the inventory should change to: 
(vgpu-type-a: 1; vgpu-type-b: 0). With multi-function accelerators, we 
start with an RP inventory of (region-type-A: 1, function-X: 4). But, 
after consuming a unit of that function, ideally the inventory should 
change to: (region-type-A: 0, function-X: 3).


I understand that this approach is controversial :) Also, one difference 
from the vGPU case is that the number and count of vGPU types is static, 
whereas with FPGAs, one could reprogram it to result in more or fewer 
functions. That said, we could hopefully keep this analogy in mind for 
future discussions.


We probably will not support multi-function accelerators in Rocky. This 
discussion is for the longer term.


Regards,
Sundar

On 3/23/2018 12:44 PM, Eric Fried wrote:

Sundar-

First thought is to simplify by NOT keeping inventory information in
the cyborg db at all.  The provider record in the placement service
already knows the device (the provider ID, which you can look up in the
cyborg db) the host (the root_provider_uuid of the provider representing
the device) and the inventory, and (I hope) you'll be augmenting it with
traits indicating what functions it's capable of.  That way, you'll
always get allocation candidates with devices that *can* load the
desired function; now you just have to engage your weigher to prioritize
the ones that already have it loaded so you can prefer those.

Am I missing something?

efried

On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:

Hi all,
     There seems to be a possibility of a race condition in the
Cyborg/Nova flow. Apologies for missing this earlier. (You can refer to
the proposed Cyborg/Nova spec
<https://review.openstack.org/#/c/554717/1/doc/specs/rocky/cyborg-nova-sched.rst>
for details.)

Consider the scenario where the flavor specifies a resource class for a
device type, and also specifies a function (e.g. encrypt) in the extra
specs. The Nova scheduler would only track the device type as a
resource, and Cyborg needs to track the availability of functions.
Further, to keep it simple, say all the functions exist all the time (no
reprogramming involved).

To recap, here is the scheduler flow for this case:

   * A request spec with a flavor comes to Nova conductor/scheduler. The
 flavor has a device type as a resource class, and a function in the
 extra specs.
   * Placement API returns the list of RPs (compute nodes) which contain
 the requested device types (but not necessarily the function).
   * Cyborg will provide a custom filter which queries Cyborg DB. This
 needs to check which hosts contain the needed function, and filter
 out the rest.
   * The scheduler selects one node from the filtered list, and the
 request goes to the compute node.

For the filter to work, the Cyborg DB needs to maintain a table with
triples of (host, function type, #free units). The filter checks if a
given host has one or more free units of the requested function type.
But, to keep the # free units up to date, Cyborg on the selected compute
node needs to notify the Cyborg API to decrement the #free units when an
instance is spawned, and to increment them when resources are released.

Therein lies the catch: this loop from the compute node to controller is
susceptible to race conditions. For example, if two simultaneous
requests each ask for function A, and there is only one unit of that
available, the Cyborg filter will approve both, both may land on the
same host, and one will fail. This is because Cyborg on the controller
does not decrement resource usage due to one request before processing
the next request.

This is similar to this previous Nova scheduling issue
<https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/placement-claims.html>.
That was solved by having the scheduler claim a resource in Placement
for the selected node. I don't see an analog for Cyborg, since it would
not know which node is selected.

Thanks in advance for suggestions and solutions.

Regards,
Sundar








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?

Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-25 Thread Nadathur, Sundar

On 3/23/2018 12:44 PM, Eric Fried wrote:

Sundar-

First thought is to simplify by NOT keeping inventory information in
the cyborg db at all.  The provider record in the placement service
already knows the device (the provider ID, which you can look up in the
cyborg db) the host (the root_provider_uuid of the provider representing
the device) and the inventory, and (I hope) you'll be augmenting it with
traits indicating what functions it's capable of.  That way, you'll
always get allocation candidates with devices that *can* load the
desired function; now you just have to engage your weigher to prioritize
the ones that already have it loaded so you can prefer those.

Eric,
   Thanks for the response.

   Traits only indicate whether a qualitative capability exists. To 
check if a free instance of the requested function exists in the host, 
we have to count the total count and free count of the needed function. 
Otherwise, we may pick a host because it *can* host a function, though 
it doesn't have a free instance of the function.


IIUC, your reply seems to expect that we can always reprogram a function 
as needed. The specific case we are looking at here is one where no 
reprogramming is involved. In the terminology of Cyborg/Nova 
rescheduling spec <https://review.openstack.org/#/c/554717/>, this is 
the pre-programmed scenario (reasons why an operator may want this are 
stated in the spec). However, even if reprogramming is allowed, to 
prioritize hosts with free instances of the needed function, we will 
need to count how many free instances there are.


Since we said that only device types will be tracked as resource 
classes, and not functions, the scheduler will count available instances 
of device types, and Cyborg would have to count the functions separately.


Please let me know if I missed something.

Thanks & Regards,
Sundar


Am I missing something?

efried

On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:

Hi all,
     There seems to be a possibility of a race condition in the
Cyborg/Nova flow. Apologies for missing this earlier. (You can refer to
the proposed Cyborg/Nova spec
<https://review.openstack.org/#/c/554717/1/doc/specs/rocky/cyborg-nova-sched.rst>
for details.)

Consider the scenario where the flavor specifies a resource class for a
device type, and also specifies a function (e.g. encrypt) in the extra
specs. The Nova scheduler would only track the device type as a
resource, and Cyborg needs to track the availability of functions.
Further, to keep it simple, say all the functions exist all the time (no
reprogramming involved).

To recap, here is the scheduler flow for this case:

   * A request spec with a flavor comes to Nova conductor/scheduler. The
 flavor has a device type as a resource class, and a function in the
 extra specs.
   * Placement API returns the list of RPs (compute nodes) which contain
 the requested device types (but not necessarily the function).
   * Cyborg will provide a custom filter which queries Cyborg DB. This
 needs to check which hosts contain the needed function, and filter
 out the rest.
   * The scheduler selects one node from the filtered list, and the
 request goes to the compute node.

For the filter to work, the Cyborg DB needs to maintain a table with
triples of (host, function type, #free units). The filter checks if a
given host has one or more free units of the requested function type.
But, to keep the # free units up to date, Cyborg on the selected compute
node needs to notify the Cyborg API to decrement the #free units when an
instance is spawned, and to increment them when resources are released.

Therein lies the catch: this loop from the compute node to controller is
susceptible to race conditions. For example, if two simultaneous
requests each ask for function A, and there is only one unit of that
available, the Cyborg filter will approve both, both may land on the
same host, and one will fail. This is because Cyborg on the controller
does not decrement resource usage due to one request before processing
the next request.

This is similar to this previous Nova scheduling issue
<https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/placement-claims.html>.
That was solved by having the scheduler claim a resource in Placement
for the selected node. I don't see an analog for Cyborg, since it would
not know which node is selected.

Thanks in advance for suggestions and solutions.

Regards,
Sundar








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ.

[openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-22 Thread Nadathur, Sundar

Hi all,
    There seems to be a possibility of a race condition in the 
Cyborg/Nova flow. Apologies for missing this earlier. (You can refer to 
the proposed Cyborg/Nova spec 
 
for details.)


Consider the scenario where the flavor specifies a resource class for a 
device type, and also specifies a function (e.g. encrypt) in the extra 
specs. The Nova scheduler would only track the device type as a 
resource, and Cyborg needs to track the availability of functions. 
Further, to keep it simple, say all the functions exist all the time (no 
reprogramming involved).


To recap, here is the scheduler flow for this case:

 * A request spec with a flavor comes to Nova conductor/scheduler. The
   flavor has a device type as a resource class, and a function in the
   extra specs.
 * Placement API returns the list of RPs (compute nodes) which contain
   the requested device types (but not necessarily the function).
 * Cyborg will provide a custom filter which queries Cyborg DB. This
   needs to check which hosts contain the needed function, and filter
   out the rest.
 * The scheduler selects one node from the filtered list, and the
   request goes to the compute node.

For the filter to work, the Cyborg DB needs to maintain a table with 
triples of (host, function type, #free units). The filter checks if a 
given host has one or more free units of the requested function type. 
But, to keep the # free units up to date, Cyborg on the selected compute 
node needs to notify the Cyborg API to decrement the #free units when an 
instance is spawned, and to increment them when resources are released.


Therein lies the catch: this loop from the compute node to controller is 
susceptible to race conditions. For example, if two simultaneous 
requests each ask for function A, and there is only one unit of that 
available, the Cyborg filter will approve both, both may land on the 
same host, and one will fail. This is because Cyborg on the controller 
does not decrement resource usage due to one request before processing 
the next request.


This is similar to this previous Nova scheduling issue 
. 
That was solved by having the scheduler claim a resource in Placement 
for the selected node. I don't see an analog for Cyborg, since it would 
not know which node is selected.


Thanks in advance for suggestions and solutions.

Regards,
Sundar






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Cyborg] Separate spec for compute node flows?

2018-03-21 Thread Nadathur, Sundar

Hi all,

    The Cyborg Nova scheduling specification 
 
addresses the scheduling aspects alone. There needs to be a separate 
spec to address:


* Cyborg/Nova interactions in the compute node, incl. the newly proposed 
os-acc library.

* Programming, including fetching bitstreams from Glance.
* Bitstream metadata.

Shall I send such a spec while the first one is still in review?

Regards,
Sundar


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-18 Thread Nadathur, Sundar

Sorry for the delayed response. I broadly agree with previous replies.

For the concerns about the impact of Cyborg weigher on scheduling 
performance , there are some options (apart from filtering candidates as 
much as possible in Placement):
* Handle hosts in bulk by extending BaseWeigher 
<https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and 
overriding weigh_objects 
<https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(), 
instead of handling one host at a time.
* If we have to handle one host at a time for whatever reason, since the 
weigher is maintained by Cyborg, it could directly query Cyborg DB 
rather than go through Cyborg REST API. This will be not unlike other 
weighers.


Given these and other possible optimizations, it may be too soon to 
worry about the performance impact.


I am working on a spec that will capture the flow discussed in the PTG. 
I will try to address these aspects as well.


Thanks & Regards,
Sundar

On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
@jay I'm also against a weigher in nova/placement. This should be an 
optional step depends on vendor implementation, not a default one.


@Alex I think we should explore the idea of preferred trait.

@Mathew: Like Sean said, Cyborg wants to support both reprogrammable 
FPGA and pre-programed ones.
Therefore it is correct that in your description, the programming 
operation should be a call from Nova to Cyborg, and cyborg will 
complete the operation while nova waits. The only problem is that the 
weigher step should be an optional one.



On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypi...@gmail.com 
<mailto:jaypi...@gmail.com>> wrote:


On 03/06/2018 09:36 PM, Alex Xu wrote:

2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com
<mailto:sou...@gmail.com> <mailto:sou...@gmail.com
<mailto:sou...@gmail.com>>>:



    2018-03-06 22:45 GMT+08:00 Mooney, Sean K
<sean.k.moo...@intel.com <mailto:sean.k.moo...@intel.com>
    <mailto:sean.k.moo...@intel.com
<mailto:sean.k.moo...@intel.com>>>:

        __ __

        __ __

        *From:*Matthew Booth [mailto:mbo...@redhat.com
<mailto:mbo...@redhat.com>
        <mailto:mbo...@redhat.com <mailto:mbo...@redhat.com>>]
        *Sent:* Saturday, March 3, 2018 4:15 PM
        *To:* OpenStack Development Mailing List (not for usage
        questions) <openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>
        <mailto:openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>>>
        *Subject:* Re: [openstack-dev] [Nova] [Cyborg]
Tracking multiple
        functions

        __ __

        On 2 March 2018 at 14:31, Jay Pipes
<jaypi...@gmail.com <mailto:jaypi...@gmail.com>
        <mailto:jaypi...@gmail.com
<mailto:jaypi...@gmail.com>>> wrote:

            On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:

                Hello Nova team,

                  During the Cyborg discussion at Rocky
PTG, we
                proposed a flow for FPGAs wherein the request
spec asks
                for a device type as a resource class, and
optionally a
                function (such as encryption) in the extra
specs. This
                does not seem to work well for the usage model
that I’ll
                describe below.

                An FPGA device may implement more than one
function. For
                example, it may implement both compression and
                encryption. Say a cluster has 10 devices of
device type
                X, and each of them is programmed to offer 2
instances
                of function A and 4 instances of function B. More
                specifically, the device may implement 6 PCI
functions,
                with 2 of them tied to function A, and the
other 4 tied
                to function B. So, we could have 6 separate
instances
                accessing functions on the same device.

        __ __

        Does this imply that Cyborg can't reprogram the FPGA
at all?

        */[Mooney, Sean K] cyborg is intended to support fixed
function
        acclerators also so it will not always be able to
program the
        accelerator. In this case where an fpga is
preprogramed with a
        multi function bitstream that is statically
provisioned cyborge
        will not be able to reprogram the slot if any 

Re: [openstack-dev] [cyborg]Summary of Mar 14 Meeting

2018-03-17 Thread Nadathur, Sundar

Hi Howard and all,

    Re. my AR to write a spec, please confirm the following:

* Since the weigher is part of the overall scheduling flow, I presume 
the spec has to cover the scheduling flow that we hashed out in the PTG. 
The compute node aspects could be a separate spec.


* Since there were many questions about the use cases as well, they 
would also need to be covered in the spec.


* This spec would be complementary to current Cyborg-Nova spec 
 
. (It is in addition to it, does not replace it.)


* The spec is not confined to FPGAs but should cover all devices, just 
as the current Cyborg-Nova spec 
.


Thanks,

Sundar


On 3/15/2018 9:00 PM, Zhipeng Huang wrote:

Hi Team,

Here are the meeting summary for our post-ptg kickoff meeting.

[]
2. Rocky Cycle Task Assignments:

Please refer to the meeting minutes about the action items: 
http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-03-14-14.07.html 


--
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com 
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu 
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cyborg]Weekly Team Meeting 2018.03.14 Agenda (No Time Change For US)

2018-03-13 Thread Nadathur, Sundar
Hi Howard,
Can we discuss the possibility of using a filter/weigher that invokes 
Cyborg API, as we discussed during the Cyborg/Nova discussion in the PTG?

This is line 56 in 
https://etherpad.openstack.org/p/cyborg-ptg-rocky-nova-cyborg-interaction .

Regards,
Sundar

From: Zhipeng Huang [mailto:zhipengh...@gmail.com]
Sent: Monday, March 12, 2018 1:28 AM
To: OpenStack Development Mailing List (not for usage questions) 

Cc: Konstantinos Samaras-Tsakiris ; 
Dutch Althoff 
Subject: [openstack-dev] [cyborg]Weekly Team Meeting 2018.03.14 Agenda (No Time 
Change For US)

Hi Team,

We will resume the team meeting this week. The meeting starting time is still 
ET 10:00am/PT 7:00am, whereas in China it is moved one hour early to 10:00pm. 
For Europe please refer to UTC1400 as the baseline.

This week we will have a special 2 hour meeting. In the first one hour we will 
have Shaohe demo the PoC Intel dev team had conducted, and in the second half 
we will confirm the task and milestones for Rocky based upon the PTG discussion 
(summary sent out last Friday).

ZOOM link will be provided before the meeting :)

If there are any other topics anyone would like to propose, feel free to reply 
to this email thread.

--
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

2018-03-02 Thread Nadathur, Sundar
Hello Nova team,
During the Cyborg discussion at Rocky PTG, we proposed a flow for FPGAs 
wherein the request spec asks for a device type as a resource class, and 
optionally a function (such as encryption) in the extra specs. This does not 
seem to work well for the usage model that I'll describe below.

An FPGA device may implement more than one function. For example, it may 
implement both compression and encryption. Say a cluster has 10 devices of 
device type X, and each of them is programmed to offer 2 instances of function 
A and 4 instances of function B. More specifically, the device may implement 6 
PCI functions, with 2 of them tied to function A, and the other 4 tied to 
function B. So, we could have 6 separate instances accessing functions on the 
same device.

In the current flow, the device type X is modeled as a resource class, so 
Placement will count how many of them are in use. A flavor for 'RC 
device-type-X + function A' will consume one instance of the RC device-type-X.  
But this is not right because this precludes other functions on the same device 
instance from getting used.

One way to solve this is to declare functions A and B as resource classes 
themselves and have the flavor request the function RC. Placement will then 
correctly count the function instances. However, there is still a problem: if 
the requested function A is not available, Placement will return an empty list 
of RPs, but we need some way to reprogram some device to create an instance of 
function A.

Regards,
Sundar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev