----- Original Message ----- > On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote: > > Hi All, > > > > Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for > > container based workload is a standard deployment pattern. However, > > auto-scaling this cluster based on load would require some integration > > between k8s OpenStack components. While looking at the option of > > leveraging Heat ASG to achieve autoscaling, I came across few requirements > > that the list can discuss and arrive at the best possible solution. > > > > A typical k8s deployment scenario on OpenStack would be as below. > > > > - Master (single VM) > > - Minions/Nodes (AutoScalingGroup) > > > > AutoScaling of the cluster would involve both scaling of minions/nodes and > > scaling Pods(ReplicationControllers). > > > > 1. Scaling Nodes/Minions: > > > > We already have utilization stats collected at the hypervisor level, as > > ceilometer compute agent polls the local libvirt daemon to acquire > > performance data for the local instances/nodes. > > I really doubts if those metrics are so useful to trigger a scaling > operation. My suspicion is based on two assumptions: 1) autoscaling > requests should come from the user application or service, not from the > controller plane, the application knows best whether scaling is needed; > 2) hypervisor level metrics may be misleading in some cases. For > example, it cannot give an accurate CPU utilization number in the case > of CPU overcommit which is a common practice.
I agree that correct utilization statistics is complex with virtual infrastructure. However, I think physical+hypervisor metrics (collected by compute agent) should be a good point to start. > > Also, Kubelet (running on the node) collects the cAdvisor stats. However, > > cAdvisor stats are not fed back to the scheduler at present and scheduler > > uses a simple round-robin method for scheduling. > > It looks like a multi-layer resource management problem which needs a > wholistic design. I'm not quite sure if scheduling at the container > layer alone can help improve resource utilization or not. k8s scheduler is going to improve over time to use the cAdvisor/heapster metrics for better scheduling. IMO, we should leave that for k8s to handle. My point is on getting that metrics to ceilometer either from the nodes or from the \ scheduler/master. > > Req 1: We would need a way to push stats from the kubelet/cAdvisor to > > ceilometer directly or via the master(using heapster). Alarms based on > > these stats can then be used to scale up/down the ASG. > > To send a sample to ceilometer for triggering autoscaling, we will need > some user credentials to authenticate with keystone (even with trusts). > We need to pass the project-id in and out so that ceilometer will know > the correct scope for evaluation. We also need a standard way to tag > samples with the stack ID and maybe also the ASG ID. I'd love to see > this done transparently, i.e. no matching_metadata or query confusions. > > > There is an existing blueprint[1] for an inspector implementation for > > docker hypervisor(nova-docker). However, we would probably require an > > agent running on the nodes or master and send the cAdvisor or heapster > > stats to ceilometer. I've seen some discussions on possibility of > > leveraging keystone trusts with ceilometer client. > > An agent is needed, definitely. > > > Req 2: Autoscaling Group is expected to notify the master that a new node > > has been added/removed. Before removing a node the master/scheduler has to > > mark node as > > unschedulable. > > A little bit confused here ... are we scaling the containers or the > nodes or both? We would only focusing on the nodes. However, adding/removing nodes without the k8s master/scheduler knowing about it (so that it can schedule pods or make them unschedulable)would be useless. > > Req 3: Notify containers/pods that the node would be removed for them to > > stop accepting any traffic, persist data. It would also require a cooldown > > period before the node removal. > > There have been some discussions on sending messages, but so far I don't > think there is a conclusion on the generic solution. > > Just my $0.02. Thanks Qiming. > BTW, we have been looking into similar problems in the Senlin project. Great. We can probably discuss these during the Summit? I assume there is already a session on Senlin planned, right? > > Regards, > Qiming > > > Both requirement 2 and 3 would probably require generating scaling event > > notifications/signals for master and containers to consume and probably > > some ASG lifecycle hooks. > > > > > > Req 4: In case of too many 'pending' pods to be scheduled, scheduler would > > signal ASG to scale up. This is similar to Req 1. > > > > > > 2. Scaling Pods > > > > Currently manual scaling of pods is possible by resizing > > ReplicationControllers. k8s community is working on an abstraction, > > AutoScaler[2] on top of ReplicationController(RC) that provides > > intention/rule based autoscaling. There would be a requirement to collect > > cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is > > beyond the scope of OpenStack. > > > > Any thoughts and ideas on how to realize this use-case would be > > appreciated. > > > > > > [1] > > https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b > > [2] > > https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md > > > > Regards, > > Rabi Mishra > > > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: [email protected]?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
