Re: [openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer
On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote: Hi All, Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution. A typical k8s deployment scenario on OpenStack would be as below. - Master (single VM) - Minions/Nodes (AutoScalingGroup) AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers). 1. Scaling Nodes/Minions: We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. I really doubts if those metrics are so useful to trigger a scaling operation. My suspicion is based on two assumptions: 1) autoscaling requests should come from the user application or service, not from the controller plane, the application knows best whether scaling is needed; 2) hypervisor level metrics may be misleading in some cases. For example, it cannot give an accurate CPU utilization number in the case of CPU overcommit which is a common practice. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling. It looks like a multi-layer resource management problem which needs a wholistic design. I'm not quite sure if scheduling at the container layer alone can help improve resource utilization or not. Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG. To send a sample to ceilometer for triggering autoscaling, we will need some user credentials to authenticate with keystone (even with trusts). We need to pass the project-id in and out so that ceilometer will know the correct scope for evaluation. We also need a standard way to tag samples with the stack ID and maybe also the ASG ID. I'd love to see this done transparently, i.e. no matching_metadata or query confusions. There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client. An agent is needed, definitely. Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as unschedulable. A little bit confused here ... are we scaling the containers or the nodes or both? Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal. There have been some discussions on sending messages, but so far I don't think there is a conclusion on the generic solution. Just my $0.02. BTW, we have been looking into similar problems in the Senlin project. Regards, Qiming Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks. Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1. 2. Scaling Pods Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler[2] on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to collect cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond the scope of OpenStack. Any thoughts and ideas on how to realize this use-case would be appreciated. [1] https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b [2] https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md Regards, Rabi Mishra __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage
Re: [openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer
You can take a look onto Murano Kubernetes package. There is no autoscaling out of the box, but it will be quite trivial to add a new action for that as there are functions to add new ETC and Kubernetes nodes on master as well as there is a function to add a new VM. Here is an example of a scaleUp action: https://github.com/gokrokvertskhov/murano-app-incubator/blob/monitoring-ha/io.murano.apps.java.HelloWorldCluster/Classes/HelloWorldCluster.murano#L93 Here is Kubernetes scaleUp action: https://github.com/openstack/murano-apps/blob/master/Docker/Kubernetes/KubernetesCluster/package/Classes/KubernetesCluster.yaml#L441 And here is a place where Kubernetes master is update with a new node info: https://github.com/openstack/murano-apps/blob/master/Docker/Kubernetes/KubernetesCluster/package/Classes/KubernetesMinionNode.yaml#L90 By that way as you can see there is cAdvisor setup on a new node too. Thanks Gosha On Tue, Apr 28, 2015 at 8:52 AM, Rabi Mishra ramis...@redhat.com wrote: - Original Message - On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote: Hi All, Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution. A typical k8s deployment scenario on OpenStack would be as below. - Master (single VM) - Minions/Nodes (AutoScalingGroup) AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers). 1. Scaling Nodes/Minions: We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. I really doubts if those metrics are so useful to trigger a scaling operation. My suspicion is based on two assumptions: 1) autoscaling requests should come from the user application or service, not from the controller plane, the application knows best whether scaling is needed; 2) hypervisor level metrics may be misleading in some cases. For example, it cannot give an accurate CPU utilization number in the case of CPU overcommit which is a common practice. I agree that correct utilization statistics is complex with virtual infrastructure. However, I think physical+hypervisor metrics (collected by compute agent) should be a good point to start. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling. It looks like a multi-layer resource management problem which needs a wholistic design. I'm not quite sure if scheduling at the container layer alone can help improve resource utilization or not. k8s scheduler is going to improve over time to use the cAdvisor/heapster metrics for better scheduling. IMO, we should leave that for k8s to handle. My point is on getting that metrics to ceilometer either from the nodes or from the \ scheduler/master. Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG. To send a sample to ceilometer for triggering autoscaling, we will need some user credentials to authenticate with keystone (even with trusts). We need to pass the project-id in and out so that ceilometer will know the correct scope for evaluation. We also need a standard way to tag samples with the stack ID and maybe also the ASG ID. I'd love to see this done transparently, i.e. no matching_metadata or query confusions. There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client. An agent is needed, definitely. Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as unschedulable. A little bit confused here ... are we scaling the containers or the nodes or both? We would only focusing on the nodes. However, adding/removing nodes without the k8s master/scheduler knowing about it (so that it can schedule pods or make them unschedulable)would be useless. Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic,
Re: [openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer
- Original Message - On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote: Hi All, Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution. A typical k8s deployment scenario on OpenStack would be as below. - Master (single VM) - Minions/Nodes (AutoScalingGroup) AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers). 1. Scaling Nodes/Minions: We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. I really doubts if those metrics are so useful to trigger a scaling operation. My suspicion is based on two assumptions: 1) autoscaling requests should come from the user application or service, not from the controller plane, the application knows best whether scaling is needed; 2) hypervisor level metrics may be misleading in some cases. For example, it cannot give an accurate CPU utilization number in the case of CPU overcommit which is a common practice. I agree that correct utilization statistics is complex with virtual infrastructure. However, I think physical+hypervisor metrics (collected by compute agent) should be a good point to start. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling. It looks like a multi-layer resource management problem which needs a wholistic design. I'm not quite sure if scheduling at the container layer alone can help improve resource utilization or not. k8s scheduler is going to improve over time to use the cAdvisor/heapster metrics for better scheduling. IMO, we should leave that for k8s to handle. My point is on getting that metrics to ceilometer either from the nodes or from the \ scheduler/master. Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG. To send a sample to ceilometer for triggering autoscaling, we will need some user credentials to authenticate with keystone (even with trusts). We need to pass the project-id in and out so that ceilometer will know the correct scope for evaluation. We also need a standard way to tag samples with the stack ID and maybe also the ASG ID. I'd love to see this done transparently, i.e. no matching_metadata or query confusions. There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client. An agent is needed, definitely. Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as unschedulable. A little bit confused here ... are we scaling the containers or the nodes or both? We would only focusing on the nodes. However, adding/removing nodes without the k8s master/scheduler knowing about it (so that it can schedule pods or make them unschedulable)would be useless. Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal. There have been some discussions on sending messages, but so far I don't think there is a conclusion on the generic solution. Just my $0.02. Thanks Qiming. BTW, we have been looking into similar problems in the Senlin project. Great. We can probably discuss these during the Summit? I assume there is already a session on Senlin planned, right? Regards, Qiming Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks. Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1. 2. Scaling Pods Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler[2] on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to
[openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer
Hi All, Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution. A typical k8s deployment scenario on OpenStack would be as below. - Master (single VM) - Minions/Nodes (AutoScalingGroup) AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers). 1. Scaling Nodes/Minions: We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling. Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG. There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client. Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as unschedulable. Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal. Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks. Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1. 2. Scaling Pods Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler[2] on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to collect cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond the scope of OpenStack. Any thoughts and ideas on how to realize this use-case would be appreciated. [1] https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b [2] https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md Regards, Rabi Mishra __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev