[
https://issues.apache.org/jira/browse/WHIRR-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994831#comment-12994831
]
David Alves commented on WHIRR-238:
-----------------------------------
I'm also for using whatever exists. I've used both Ganglia and the hadoop
metrics collection system in the past. Ganglia as a metrics gatherer/display
and hadoop for metrics collection/publishing.
I was thinking of using the hadoop metrics gathering system (even though I
always end having to hack it a bit), it integrates well with hadoop and hbase,
and provides a lot of functionality out of the box (namely publishing to
Ganglia).
Regarding Ganglia, I see it as an out of the box solution for the monitoring
web page, but not as a solution for the distributed coordinator with pluggable
scaling rules problem.
> Scaling Monitor/Coordinator
> ---------------------------
>
> Key: WHIRR-238
> URL: https://issues.apache.org/jira/browse/WHIRR-238
> Project: Whirr
> Issue Type: New Feature
> Components: core
> Reporter: David Alves
>
> From the mailing list:
> General idea:
> Add an elastic scaling monitor and coordinator, i.e. a whirr process that
> would be running on some or all of the nodes that:
> - would collect load metrics (both generic and specific to each
> application)
> - would feed them through an elastic decision making engine (also
> specific to each application as it depends on the specific metrics)
> - would then act on those decisions by either expanding or contracting
> the cluster.
> Some specifics:
> - it must not be completely distributed, i.e. it can have a specific
> assigned node that will monitor/coordinate but this node must not be fixed,
> i.e. it could/should change if the previous coordinator leaves the cluster.
> - each application would define the set of metrics that it emits and
> use a local monitor process to feed them to the coordinator.
> - the monitor process should emit some standard metrics (Disk I/O, CPU
> Load, Net I/O, memory)
> - the coordinator would have a pluggable decision engine policy also
> defined by the application that would consume metrics and make a decision.
> - whirr would take care of requesting/releasing nodes and
> adding/removing them from the relevant services.
> Some implementation ideas:
> - it could tun on top of zookeeper. zk is already a requirement for
> several services and would allow to reliably store coordinator state so that
> another node can pickup if the previous coordinator leaves the cluster.
> - it could use Avro to serialize/deserialize metrics data
> - it should be optional, i.e. simply another service that the whirr cli
> starts
> - it would also be nice to have a monitor/coordinator web page that
> would display metrics and view cluster status in an aggregated view.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira