On Fri, Feb 20, 2015 at 8:53 AM, Rohit Yadav <[email protected]> wrote:
> Hi, > > I'm trying to explore how to make systemvms more robust and > fault-tolerant, and the manual/automated QA of systemvms. One of the > common user facing issues related to scalability was the reset > password/key servers where the VR serves data using socat etc using > forking mechanisms and global locks. This slows down the processes such > as reset password. > > More here: https://issues.apache.org/jira/browse/CLOUDSTACK-8272 > > One of the blindly thrown solutions includes increasing the VR RAM which > works for at scale but then seems to fail again when the load is > increased beyond a point. I don't know of any performance and stress > testing reports that tell us about these bottlenecks. Please share if > you have done anything in this regard. > > Increasing the RAM is probably a perfectly fine solution if you're running basic networking or have a small advanced infrastructure, but when you get up to hundreds, or thousands of VRs even a bump of 128MB is considered major. (100 VRs + 128MB RAM = 12GB of additional RAM used). Thus, keeping the footprint small is important. > I want to do couple of things: > > - Explore systemvm build changes using newer tools such as packer > - Cleanup script execution and code in resource layer > - Start replacing bash scripts with more robust implementations, perhaps > a single or few agents on VRs that provide non-hardcoded well-documented > interfaces > - Right now everything in VR/systemvms is sort of hardcoded and the > services/interfaces are not well-documented. The idea is to refactor and > wrap everything we want to do with the systemvms in a general agents > framework that provides monitoring and managing the VRs (do stuff like > upgrades etc to combat things like ghost, poodle issues): > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Agents+Framework > > What are the other issues you've had in past that you would like to be > improved? > > I like all those ideas. In general our pain areas are: - extraordinary upgrades out of the ordinary version upgrades, poodle etc. - upgrading during version upgrades. most of the times the changes in the system vms are minor, and could be solved with some simple upgrade scripts on the vms -- Erik
