Christopher, Thanks for the inquiry. Let me do my best to try and address your questions based on the motivation behind the platform maturity requirements.
First paragraph: As you know, the requirements are being measured by project. Now within a project, there may be runtime and design-time components. In those cases, the sub-components should be measured on the appropriate requirements for that component. Second paragraph: sorry if I missed a question on the wiki, but there will likely not be an external validation of the resiliency requirements. As a project you will report whether you support the level 2 requirements. The Benchmark team will hopefully be able to suggest tests that you can run to self-validate. Third paragraph: Appreciate your input here. We actually tried to remove any implementation specification from the requirement. The key is the spirit of the requirement that resiliency be handled in an automated fashion and not require manual intervention. - Are clients that call APIs on ONAP components that claim to be level 2 required to implement retry logic upon failure? - There is no requirement for Beijing for clients of ONAP components to retry (though this is a typical best practice). MSB is doing some work with sidecars which often implement retry logic automatically. However, if your "automatic detection and rerouting" logic is not provided on the server-side, the expectation is that you would provide a client SDK that implements retry and rerouting. - How quickly must a component recover to achieve level 2 status? - There is no requirement for recovery time for resiliency level 2. - What are the requirements for in-process transactions on a failed component? - For resiliency level 2, there is no requirement to recover in-process transactions on a failed component. For level 3, there should be a plan for reducing the # of lost transactions when there is a failure (this could be done by reducing detection time - Is detection of a failure the responsibility of the individual components, or will something outside the component detect the failure and initiate a response. (The purpose of DCAE as a platform is to do just that, by the way.) - Ideally, we'd like to see a common capability for failure detection and rerouting across all of ONAP, though projects still have the independence to implement their own approach. Many projects are leveraging what OOM provides via Kubernetes. - For the baseline measures of failed requests and data loss, what are the load conditions under which that will be tested? - As discussed above, this is up to the project. I hope this helps. Regards, Jason Hunt Executive Software Architect, IBM Phone: 314-749-7422 Email: djh...@us.ibm.com Twitter: @DJHunt From: "RATH, CHRISTOPHER A (CHRISTOPHER A)" <c...@research.att.com> To: "djh...@us.ibm.com" <djh...@us.ibm.com> Cc: "onap-...@lists.onap.org" <onap-...@lists.onap.org>, "onap-discuss@lists.onap.org" <onap-discuss@lists.onap.org> Date: 01/24/2018 08:01 AM Subject: Platform Maturity Requirements Jason, My team contributed the majority of the DCAEGEN2 project code. I have a few questions about meeting the platform maturity requirements, specifically for the resiliency and scaling portions. First, DCAE is a platform, not a component. It consists of many different components, some of them developed by us, others are open source projects. When asked to commit to one of the platform maturity levels, we attempted to propose a middle ground between level 1 and level 2 for most of the areas, because some of the components can probably meet the level 2 requirements in the given timeframe, but others will not. This was not deemed acceptable, so I am trying to determine how complex components like DCAE are going to be tested as a whole for meeting these requirements. Is each sub-component evaluated separately? If not, what constitutes a failure for DCAEGEN2 as a whole? Second, assuming that the tests will be done on individual subcomponents of DCAE, the resiliency and scalability requirements do not give me enough information for my team to try and meet them. I had posted a question on the wiki page asking for a description of the test environment under which these platform maturity level claims were to be tested, but haven’t seen a response yet. This includes a description of what tests will be run and what the expected outcome of those tests are based on the claimed level of maturity by the component. For example, one way to achieve resiliency would be to have three copies of every DCAE component. However, given the backlash against the size of DCAE today, I doubt that will be an acceptable solution, but there are no requirements on which to design a different solution. Third, the requirements, as noted by other posters from the community, are written in a way that assumes a particular implementation. For resiliency, stating that the component has to detect failure and reroute presupposes that multiple copies of the component are running and something in front of them detects the failure automatically and routes to a working instance. The requirements ought to be stated in terms of the net effects of failures on the clients of the component or the running system as a whole. For example, a level 3+ requirement may be that clients accessing a component API do not get a failure and get a successful outcome in a component failure scenario within 75% of the mean response time when there are no failures. Stating these requirements in these terms obviates the need to distinguish between stateful and stateless components. Those become implementation details. Some specific areas to address for Level 2: - Are clients that call APIs on ONAP components that claim to be level 2 required to implement retry logic upon failure? - How quickly must a component recover to achieve level 2 status? - What are the requirements for in-process transactions on a failed component? - Is detection of a failure the responsibility of the individual components, or will something outside the component detect the failure and initiate a response. (The purpose of DCAE as a platform is to do just that, by the way.) - For the baseline measures of failed requests and data loss, what are the load conditions under which that will be tested? Most of this applies to the scalability requirements as well, but in addition it isn’t clear whether components are supposed to detect the need to scale themselves, or if something external to the component determines there is a need to scale and the component needs to support the APIs to allow that external component to request a scale up or down. Thanks for looking into this. -- Christopher A. Rath Director Inventive Science – Intelligent Systems Research Department Advanced Technologies & Platforms D2 Architecture & Design AT&T Services, Inc.
_______________________________________________ onap-discuss mailing list onap-discuss@lists.onap.org https://lists.onap.org/mailman/listinfo/onap-discuss