Re: [openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer

Rabi Mishra Tue, 28 Apr 2015 08:56:46 -0700

----- Original Message -----
> On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote:
> > Hi All,
> > 
> > Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for
> > container based workload is a standard deployment pattern. However,
> > auto-scaling this cluster based on load would require some integration
> > between k8s OpenStack components. While looking at the option of
> > leveraging Heat ASG to achieve autoscaling, I came across few requirements
> > that the list can discuss and arrive at the best possible solution.
> > 
> > A typical k8s deployment scenario on OpenStack would be as below.
> > 
> > - Master (single VM)
> > - Minions/Nodes (AutoScalingGroup)
> > 
> > AutoScaling of the cluster would involve both scaling of minions/nodes and
> > scaling Pods(ReplicationControllers).
> > 
> > 1. Scaling Nodes/Minions:
> > 
> > We already have utilization stats collected at the hypervisor level, as
> > ceilometer compute agent polls the local libvirt daemon to acquire
> > performance data for the local instances/nodes.
> 
> I really doubts if those metrics are so useful to trigger a scaling
> operation. My suspicion is based on two assumptions: 1) autoscaling
> requests should come from the user application or service, not from the
> controller plane, the application knows best whether scaling is needed;
> 2) hypervisor level metrics may be misleading in some cases. For
> example, it cannot give an accurate CPU utilization number in the case
> of CPU overcommit which is a common practice.


I agree that correct utilization statistics is complex with virtual 
infrastructure.
However, I think physical+hypervisor metrics (collected by compute agent) 
should be a 
good point to start.
 
> > Also, Kubelet (running on the node) collects the cAdvisor stats. However,
> > cAdvisor stats are not fed back to the scheduler at present and scheduler
> > uses a simple round-robin method for scheduling.
> 
> It looks like a multi-layer resource management problem which needs a
> wholistic design. I'm not quite sure if scheduling at the container
> layer alone can help improve resource utilization or not.

k8s scheduler is going to improve over time to use the cAdvisor/heapster 
metrics for
better scheduling. IMO, we should leave that for k8s to handle.

My point is on getting that metrics to ceilometer either from the nodes or from 
the \
scheduler/master.

> > Req 1: We would need a way to push stats from the kubelet/cAdvisor to
> > ceilometer directly or via the master(using heapster). Alarms based on
> > these stats can then be used to scale up/down the ASG.
> 
> To send a sample to ceilometer for triggering autoscaling, we will need
> some user credentials to authenticate with keystone (even with trusts).
> We need to pass the project-id in and out so that ceilometer will know
> the correct scope for evaluation. We also need a standard way to tag
> samples with the stack ID and maybe also the ASG ID. I'd love to see
> this done transparently, i.e. no matching_metadata or query confusions.
> 
> > There is an existing blueprint[1] for an inspector implementation for
> > docker hypervisor(nova-docker). However, we would probably require an
> > agent running on the nodes or master and send the cAdvisor or heapster
> > stats to ceilometer. I've seen some discussions on possibility of
> > leveraging keystone trusts with ceilometer client.
> 
> An agent is needed, definitely.
> 
> > Req 2: Autoscaling Group is expected to notify the master that a new node
> > has been added/removed. Before removing a node the master/scheduler has to
> > mark node as
> > unschedulable.
> 
> A little bit confused here ... are we scaling the containers or the
> nodes or both?

We would only focusing on the nodes. However, adding/removing nodes without the 
k8s master/scheduler 
knowing about it (so that it can schedule pods or make them unschedulable)would 
be useless.

> > Req 3: Notify containers/pods that the node would be removed for them to
> > stop accepting any traffic, persist data. It would also require a cooldown
> > period before the node removal.
> 
> There have been some discussions on sending messages, but so far I don't
> think there is a conclusion on the generic solution.
> 
> Just my $0.02.

Thanks Qiming.

> BTW, we have been looking into similar problems in the Senlin project.

Great. We can probably discuss these during the Summit? I assume there is 
already a session
on Senlin planned, right?

> 
> Regards,
>   Qiming
> 
> > Both requirement 2 and 3 would probably require generating scaling event
> > notifications/signals for master and containers to consume and probably
> > some ASG lifecycle hooks.
> > 
> > 
> > Req 4: In case of too many 'pending' pods to be scheduled, scheduler would
> > signal ASG to scale up. This is similar to Req 1.
> > 
> > 
> > 2. Scaling Pods
> > 
> > Currently manual scaling of pods is possible by resizing
> > ReplicationControllers. k8s community is working on an abstraction,
> > AutoScaler[2] on top of ReplicationController(RC) that provides
> > intention/rule based autoscaling. There would be a requirement to collect
> > cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is
> > beyond the scope of OpenStack.
> > 
> > Any thoughts and ideas on how to realize this use-case would be
> > appreciated.
> > 
> > 
> > [1]
> > https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b
> > [2]
> > https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md
> > 
> > Regards,
> > Rabi Mishra
> > 
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: [email protected]?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer

Reply via email to