On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote: > Hi All, > > Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container > based workload is a standard deployment pattern. However, auto-scaling this > cluster based on load would require some integration between k8s OpenStack > components. While looking at the option of leveraging Heat ASG to achieve > autoscaling, I came across few requirements that the list can discuss and > arrive at the best possible solution. > > A typical k8s deployment scenario on OpenStack would be as below. > > - Master (single VM) > - Minions/Nodes (AutoScalingGroup) > > AutoScaling of the cluster would involve both scaling of minions/nodes and > scaling Pods(ReplicationControllers). > > 1. Scaling Nodes/Minions: > > We already have utilization stats collected at the hypervisor level, as > ceilometer compute agent polls the local libvirt daemon to acquire > performance data for the local instances/nodes.
I really doubts if those metrics are so useful to trigger a scaling operation. My suspicion is based on two assumptions: 1) autoscaling requests should come from the user application or service, not from the controller plane, the application knows best whether scaling is needed; 2) hypervisor level metrics may be misleading in some cases. For example, it cannot give an accurate CPU utilization number in the case of CPU overcommit which is a common practice. > Also, Kubelet (running on the node) collects the cAdvisor stats. However, > cAdvisor stats are not fed back to the scheduler at present and scheduler > uses a simple round-robin method for scheduling. It looks like a multi-layer resource management problem which needs a wholistic design. I'm not quite sure if scheduling at the container layer alone can help improve resource utilization or not. > Req 1: We would need a way to push stats from the kubelet/cAdvisor to > ceilometer directly or via the master(using heapster). Alarms based on these > stats can then be used to scale up/down the ASG. To send a sample to ceilometer for triggering autoscaling, we will need some user credentials to authenticate with keystone (even with trusts). We need to pass the project-id in and out so that ceilometer will know the correct scope for evaluation. We also need a standard way to tag samples with the stack ID and maybe also the ASG ID. I'd love to see this done transparently, i.e. no matching_metadata or query confusions. > There is an existing blueprint[1] for an inspector implementation for docker > hypervisor(nova-docker). However, we would probably require an agent running > on the nodes or master and send the cAdvisor or heapster stats to ceilometer. > I've seen some discussions on possibility of leveraging keystone trusts with > ceilometer client. An agent is needed, definitely. > Req 2: Autoscaling Group is expected to notify the master that a new node has > been added/removed. Before removing a node the master/scheduler has to mark > node as > unschedulable. A little bit confused here ... are we scaling the containers or the nodes or both? > Req 3: Notify containers/pods that the node would be removed for them to stop > accepting any traffic, persist data. It would also require a cooldown period > before the node removal. There have been some discussions on sending messages, but so far I don't think there is a conclusion on the generic solution. Just my $0.02. BTW, we have been looking into similar problems in the Senlin project. Regards, Qiming > Both requirement 2 and 3 would probably require generating scaling event > notifications/signals for master and containers to consume and probably some > ASG lifecycle hooks. > > > Req 4: In case of too many 'pending' pods to be scheduled, scheduler would > signal ASG to scale up. This is similar to Req 1. > > > 2. Scaling Pods > > Currently manual scaling of pods is possible by resizing > ReplicationControllers. k8s community is working on an abstraction, > AutoScaler[2] on top of ReplicationController(RC) that provides > intention/rule based autoscaling. There would be a requirement to collect > cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond > the scope of OpenStack. > > Any thoughts and ideas on how to realize this use-case would be appreciated. > > > [1] > https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b > [2] > https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md > > Regards, > Rabi Mishra > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
