Hi, yes of course I'm using STF, and it's not complicated. It's always a good idea to separate your monitoring stack from the monitored infrastructure. How would you know your stack is down, if notifications are also sent from that stack?
With the tripleo-heat-templates you linked, you basically enable legacy telemetry (ceilometer, aodh, gnocchi). If you are running 40 computes, that is not a small stack anymore. I would suggest (recommend) to use ceph as backend. Also, depending on your use-case and your settings (for collectd) you may want to lower the interval, the parameter is CollectdDefaultPollingInterval, I have set it here to something like 5 secs, but in your case, I would suggest to use 600 (same as for Ceilometer). Matthias On 23/10/2020 11:09, Khodayar Doustar wrote: > Matthias, > > Thanks a lot for your answer. > Yes, you win the bet :) I've used swift and currently struggling to > disable collectd to make my cloud usable again! :)) > > I've seen this STF (Service Telemetry Framework) but it seems a little > bit too complicated. I should implement an OKD cluster to monitor my > openstack, isn't it too much work? > Have you tried it yourself? > > If I understand correctly, with your first and main opinion you mean > adding this files to my overcloud deploy command: > > /usr/share/openstack-tripleo-heat-templates/environments/enable-legacy-telemetry.yaml > /usr/share/openstack-tripleo-heat-templates/environments/services/collectd.yaml > > and for performance tuning I've checked this page: > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry > <https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/deployment_recommendations_for_specific_red_hat_openstack_platform_services/config-recommend-telemetry_config-recommend-telemetry#config_telemetry-small-overcloud_config-recommend-telemetry> > > Is that what you mean? > If so I should make my cloud usable again and just change GnocchiBackend > to a path to a file on a shared file system (i.e. NFS) because I have 4 > controller nodes, because the rest is exactly what I've done up to now. > > Thanks a lot, > Khodayar > > On Fri, Oct 23, 2020 at 10:01 AM Matthias Runge <[email protected] > <mailto:[email protected]>> wrote: > > On 22/10/2020 17:46, Khodayar Doustar wrote: > > Hi everybody, > > > > I am searching for a good and useful method to monitor my 40 nodes > cloud. > > > > I have tried > > > > - Prometheus + Grafana (with > > https://github.com/openstack-exporter/openstack-exporter > <https://github.com/openstack-exporter/openstack-exporter> > > <https://github.com/openstack-exporter/openstack-exporter > <https://github.com/openstack-exporter/openstack-exporter>>) but it > > cannot monitor nodes load and cpu usage etc. > > and > > - Gnocchi +Collectd + Grafana but it enforces unbelievable load on > nodes > > and make the whole cloud completely unusable! > > > > I've tried to use Graphite + Grafana but I failed. > > > > Do you have any suggestions? > > > Hi, > > yes, I have some opinions here. > > My proposal here is: > > - use collectd to collect low level metrics from your baremetal machines > - use ceilometer to collect OpenStack related info, like project usage, > etc. That is nothing you'd get by using node-exporter > - hook them both together and send metrics over to something called > Service Telemetry Framework. The configuration *is* included in tripleo. > The website has documentation available > https://infrawatch.github.io/documentation > <https://infrawatch.github.io/documentation> > - graphite + grafana (plus collectd) is also a single node setup and > won't provide you reliability. > - collectd also provides the ability to send events, which can be acted > on. That is not included if you use node-exporter, openstack-exporter > etc. Prometheus monitoring creates events from metrics, but will be slow > to detect failed components. > > Since prometheus is meant to be single server, there is no HA per se in > prometheus. That makes handling prometheus on standalone machines a bit > awkward, or you'd have a infrastructure taking care of that. > > In your tests with gnocchi, collectd and grafana, I bet you used swift > as backend for gnocchi storage. That is not a good idea and may lead to > bad performance. > > Matthias > > -- > Matthias Runge <[email protected] <mailto:[email protected]>> > > Red Hat GmbH, http://www.de.redhat.com/ <http://www.de.redhat.com/>, > Registered seat: Grasbrunn, > Commercial register: Amtsgericht Muenchen, HRB 153243, > Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael > O'Neil > > _______________________________________________ > users mailing list > [email protected] <mailto:[email protected]> > http://lists.rdoproject.org/mailman/listinfo/users > <http://lists.rdoproject.org/mailman/listinfo/users> > > To unsubscribe: [email protected] > <mailto:[email protected]> > -- Matthias Runge <[email protected]> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Man.Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neil _______________________________________________ users mailing list [email protected] http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: [email protected]
