[ 
https://issues.apache.org/jira/browse/WHIRR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994888#comment-12994888
 ] 

David Alves commented on WHIRR-238:
-----------------------------------

Ah ok I get your point :)
Ok my thoughts:
My particular use case requires that coordinators be able to move from one node 
to another when the previous coordinator leaves the cluster. This is not only 
for fault-tolerance it is actually a functional requirement.
The way I had thought about it metrics producers would know (through zk) when 
another coordinator was up and send the metrics there in a point-to-point 
connection. AFAIK in Ganglia the only way for metrics producing nodes not to 
know where the metrics consumer is is through multicast, which would not work 
in EC2 (I don't know about the other cloud providers).
That being said I had not thought about Ganglia as the metrics display 
sub-system. I think it would be rather easy to make gmetad start when a new 
coordinator starts and make the coordinator publish the metrics there (both per 
node and aggregated) giving a nice view of cluster status almost for free.
What do you think? 
 
 

> Scaling Monitor/Coordinator
> ---------------------------
>
>                 Key: WHIRR-238
>                 URL: https://issues.apache.org/jira/browse/WHIRR-238
>             Project: Whirr
>          Issue Type: New Feature
>          Components: core
>            Reporter: David Alves
>
> From the mailing list:
> General idea:
> Add an elastic scaling monitor and coordinator, i.e. a whirr process that 
> would be running on some or all of the nodes that:
>       - would collect load metrics (both generic and specific to each 
> application)
>       - would feed them through an elastic decision making engine (also 
> specific to each application as it depends on the specific metrics)
>       - would then act on those decisions by either expanding or contracting 
> the cluster.
>       Some specifics:
>       - it must not be completely distributed, i.e. it can have a specific 
> assigned node that will monitor/coordinate but this node must not be fixed, 
> i.e. it could/should change if the previous coordinator leaves the cluster.
>       - each application would define the set of metrics that it emits and 
> use a local monitor process to feed them to the coordinator.
>       - the monitor process should emit some standard metrics (Disk I/O, CPU 
> Load, Net I/O, memory)
>       - the coordinator would have a pluggable decision engine policy also 
> defined by the application that would consume metrics and make a decision.
>       - whirr would take care of requesting/releasing nodes and 
> adding/removing them from the relevant services.
>       Some implementation ideas:
>       - it could tun on top of zookeeper. zk is already a requirement for 
> several services and would allow to reliably store coordinator state so that 
> another node can pickup if the previous coordinator leaves the cluster.
>       - it could use Avro to serialize/deserialize metrics data 
>       - it should be optional, i.e. simply another service that the whirr cli 
> starts
>       - it would also be nice to have a monitor/coordinator web page that 
> would display metrics and view cluster status in an aggregated view.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to