Matthias, Thanks a lot for your answer. Yes, you win the bet :) I've used swift and currently struggling to disable collectd to make my cloud usable again! :))
I've seen this STF (Service Telemetry Framework) but it seems a little bit too complicated. I should implement an OKD cluster to monitor my openstack, isn't it too much work? Have you tried it yourself? If I understand correctly, with your first and main opinion you mean adding this files to my overcloud deploy command: /usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml /usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml and for performance tuning I've checked this page: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry Is that what you mean? If so I should make my cloud usable again and just change GnocchiBackend to a path to a file on a shared file system (i.e. NFS) because I have 4 controller nodes, because the rest is exactly what I've done up to now. Thanks a lot, Khodayar On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <[email protected]> wrote: > On 22/10/2020 17:46, Khodayar Doustar wrote: > > Hi everybody, > > > > I am searching for a good and useful method to monitor my 40 nodes cloud. > > > > I have tried > > > > - Prometheus + Grafana (with > > https://github.com/openstack-exporter/openstack-exporter > > <https://github.com/openstack-exporter/openstack-exporter>) but it > > cannot monitor nodes load and cpu usage etc. > > and > > - Gnocchi +Collectd + Grafana but it enforces unbelievable load on nodes > > and make the whole cloud completely unusable! > > > > I've tried to use Graphite + Grafana but I failed. > > > > Do you have any suggestions? > > > Hi, > > yes, I have some opinions here. > > My proposal here is: > > - use collectd to collect low level metrics from your baremetal machines > - use ceilometer to collect OpenStack related info, like project usage, > etc. That is nothing you'd get by using node-exporter > - hook them both together and send metrics over to something called > Service Telemetry Framework. The configuration *is* included in tripleo. > The website has documentation available > https://infrawatch.github.io/documentation > - graphite + grafana (plus collectd) is also a single node setup and > won't provide you reliability. > - collectd also provides the ability to send events, which can be acted > on. That is not included if you use node-exporter, openstack-exporter > etc. Prometheus monitoring creates events from metrics, but will be slow > to detect failed components. > > Since prometheus is meant to be single server, there is no HA per se in > prometheus. That makes handling prometheus on standalone machines a bit > awkward, or you'd have a infrastructure taking care of that. > > In your tests with gnocchi, collectd and grafana, I bet you used swift > as backend for gnocchi storage. That is not a good idea and may lead to > bad performance. > > Matthias > > -- > Matthias Runge <[email protected]> > > Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, > Commercial register: Amtsgericht Muenchen, HRB 153243, > Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil > > _______________________________________________ > users mailing list > [email protected] > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: [email protected] >
_______________________________________________ users mailing list [email protected] http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: [email protected]
