Christopher,

Thanks for the inquiry.  Let me do my best to try and address your 
questions based on the motivation behind the platform maturity 
requirements.

First paragraph:  As you know, the requirements are being measured by 
project.  Now within a project, there may be runtime and design-time 
components.  In those cases, the sub-components should be measured on the 
appropriate requirements for that component.

Second paragraph: sorry if I missed a question on the wiki, but there will 
likely not be an external validation of the resiliency requirements.  As a 
project you will report whether you support the level 2 requirements.  The 
Benchmark team will hopefully be able to suggest tests that you can run to 
self-validate.

Third paragraph:  Appreciate your input here.  We actually tried to remove 
any implementation specification from the requirement.  The key is the 
spirit of the requirement that resiliency be handled in an automated 
fashion and not require manual intervention.

                - Are clients that call APIs on ONAP components that claim 
to be level 2 required to implement retry logic upon failure? 
                - There is no requirement for Beijing for clients of ONAP 
components to retry (though this is a typical best practice).  MSB is 
doing some work with sidecars which often implement retry logic 
automatically.  However, if your "automatic detection and rerouting" logic 
is not provided on the server-side, the expectation is that you would 
provide a client SDK that implements retry and rerouting.

                - How quickly must a component recover to achieve level 2 
status? 
                - There is no requirement for recovery time for resiliency 
level 2.

                - What are the requirements for in-process transactions on 
a failed component? 
                - For resiliency level 2, there is no requirement to 
recover in-process transactions on a failed component.  For level 3, there 
should be a plan for reducing the # of lost transactions when there is a 
failure (this could be done by reducing detection time

                - Is detection of a failure the responsibility of the 
individual components, or will something outside the component detect the 
failure and initiate a response.  (The purpose of DCAE as a platform is to 
do just that, by the way.) 
                - Ideally, we'd like to see a common capability for 
failure detection and rerouting across all of ONAP, though projects still 
have the independence to implement their own approach.  Many projects are 
leveraging what OOM provides via Kubernetes.

                - For the baseline measures of failed requests and data 
loss, what are the load conditions under which that will be tested?
                - As discussed above, this is up to the project.

I hope this helps.


Regards,
Jason Hunt 
Executive Software Architect, IBM 

Phone: 314-749-7422
Email: djh...@us.ibm.com
Twitter: @DJHunt
 



From:   "RATH, CHRISTOPHER A (CHRISTOPHER A)" <c...@research.att.com>
To:     "djh...@us.ibm.com" <djh...@us.ibm.com>
Cc:     "onap-...@lists.onap.org" <onap-...@lists.onap.org>, 
"onap-discuss@lists.onap.org" <onap-discuss@lists.onap.org>
Date:   01/24/2018 08:01 AM
Subject:        Platform Maturity Requirements



Jason,
 
My team contributed the majority of the DCAEGEN2 project code.  I have a 
few questions about meeting the platform maturity requirements, 
specifically for the resiliency and scaling portions.
 
First, DCAE is a platform, not a component.  It consists of many different 
components, some of them developed by us, others are open source projects. 
 When asked to commit to one of the platform maturity levels, we attempted 
to propose a middle ground between level 1 and level 2 for most of the 
areas, because some of the components can probably meet the level 2 
requirements in the given timeframe, but others will not.  This was not 
deemed acceptable, so I am trying to determine how complex components like 
DCAE are going to be tested as a whole for meeting these requirements.  Is 
each sub-component evaluated separately?  If not, what constitutes a 
failure for DCAEGEN2 as a whole?
 
Second, assuming that the tests will be done on individual subcomponents 
of DCAE, the resiliency and scalability requirements do not give me enough 
information for my team to try and meet them.  I had posted a question on 
the wiki page asking for a description of the test environment under which 
these platform maturity level claims were to be tested, but haven’t seen a 
response yet.  This includes a description of what tests will be run and 
what the expected outcome of those tests are based on the claimed level of 
maturity by the component.  For example, one way to achieve resiliency 
would be to have three copies of every DCAE component.  However, given the 
backlash against the size of DCAE today, I doubt that will be an 
acceptable solution, but there are no requirements on which to design a 
different solution.
 
Third, the requirements, as noted by other posters from the community, are 
written in a way that assumes a particular implementation.  For 
resiliency, stating that the component has to detect failure and reroute 
presupposes that multiple copies of the component are running and 
something in front of them detects the failure automatically and routes to 
a working instance.  The requirements ought to be stated in terms of the 
net effects of failures on the clients of the component or the running 
system as a whole.  For example, a level 3+ requirement may be that 
clients accessing a component API do not get a failure and get a 
successful outcome in a component failure scenario within 75% of the mean 
response time when there are no failures.  Stating these requirements in 
these terms obviates the need to distinguish between stateful and 
stateless components.  Those become implementation details.
 
Some specific areas to address for Level 2:
                - Are clients that call APIs on ONAP components that claim 
to be level 2 required to implement retry logic upon failure?
                - How quickly must a component recover to achieve level 2 
status?
                - What are the requirements for in-process transactions on 
a failed component?
                - Is detection of a failure the responsibility of the 
individual components, or will something outside the component detect the 
failure and initiate a response.  (The purpose of DCAE as a platform is to 
do just that, by the way.)
                - For the baseline measures of failed requests and data 
loss, what are the load conditions under which that will be tested?
 
Most of this applies to the scalability requirements as well, but in 
addition it isn’t clear whether components are supposed to detect the need 
to scale themselves, or if something external to the component determines 
there is a need to scale and the component needs to support the APIs to 
allow that external component to request a scale up or down.
 
Thanks for looking into this.
--
Christopher A. Rath
Director Inventive Science – Intelligent Systems Research Department
Advanced Technologies & Platforms 
D2 Architecture & Design
AT&T Services, Inc.
 


_______________________________________________
onap-discuss mailing list
onap-discuss@lists.onap.org
https://lists.onap.org/mailman/listinfo/onap-discuss

Reply via email to