On Wed, Mar 29, 2017 at 2:10 AM, Ionut Biru - Fleio <io...@fleio.com> wrote: > I'm not using influxdb, just basic configuration generated by openstack > ansible, which enables file storage by default. >
Oops I honed in on the influxdb portion of the config. I see driver is set to file. > > The reason for bumping those values was to process a lot of measures and 400 > seems a high number at that time. > I was actually referring to the assigned values of 16 for TASKS_PER_WORKER and 4 for BLOCK_SIZE. My assumption is that someone has done some sort of analysis or testing to show what good values are here. That is not to discount that obviously a higher value has helped your situation here. It sounds like it is absolutely worth it to investigate what is the optimal setting here or the trade-offs of bumping this value. Lets gather a few more data points around your environment though for future reference such as you had 10 workers so 10*16 = 160 Total Tasks you could handle per metric processing wake up period that could not sustain your 70 instances. Do you know how many metrics you have per instances and whether they had other resources which would create more metrics? (nics, volumes, disks) Did you have more than one machine hosting metricd workers? (That adds to that capacity then) Is the setup baremetal or some sort of virtualized cloud? > > I did use the below values without any impact > > metric_processing_delay = 0 > I have seen values less than 10s here actually become detrimental though I did not have the time to root cause why that occurs. > metric_reporting_delay = 1 > > metric_cleanul_delay = 10 > > > I'm opened to apply any configuration modification to my setup in order to > resolve my issue without any code modification(that i did). > The current tunings I know trades system resources for higher capacity. Given that I'd like to understand what if anything was traded when you bumped those constants in the code base. Did you happen to have any additional telemetry collected on your cloud that can help characterize Gnocchi's resource consumption after your changes? Also what are you using to monitor the Gnocchi Backlog? I use a collectd plugin available here (https://github.com/akrzos/collectd-gnocchi-status) > ________________________________ > From: Alex Krzos <akr...@redhat.com> > Sent: Tuesday, March 28, 2017 8:19:58 PM > To: Ionut Biru - Fleio > Cc: openstack-operators@lists.openstack.org > Subject: Re: [Openstack-operators] scaling gnocchi metricd > > This is interesting, thanks for sharing. I assume your using an > influxdb storage driver correct? I have also wondered if there was a > specific reason for the TASKS_PER_WORKER and BLOCK_SIZE values. > > Also did you have to adjust your metric_processing_delay? > > > Alex Krzos | Performance Engineering > Red Hat > Desk: 919-754-4280 > Mobile: 919-909-6266 > > > On Tue, Mar 28, 2017 at 3:28 PM, Ionut Biru - Fleio <io...@fleio.com> wrote: >> Hello, >> >> >> I do have a cloud under administration, my setup is fairly basic, I have >> deployed openstack using Openstack Ansible, currently I'm a Newton and >> planning to upgrade on Ocata. >> >> >> I'm having a problem with gnocchi metricd falling behind on processing >> metrics. >> >> >> Gnocchi config: https://paste.xinu.at/f73A/ >> >> >> In I'm using default workers number(cpu count) the number of >> "storage/total >> number of measures to process" keeps growing, last time I had 300k in >> queue. >> In seems that the tasks are not rescheduled in order to process them all >> in >> time and it processing couples of metrics after they are received from >> ceilometer and after that they are kept in queue and I only have 10 >> compute >> nodes with about 70 instances. >> >> >> In order to process I had to set up workers to a very high number (100) >> and >> keep restarting metricd in order for them to be processed but this method >> is >> very cpu and memory intensive and luckily I found another method that >> works >> quite well. >> >> >> >> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154 >> >> >> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd >> keeps >> processing them. >> >> >> I'm not sure yet if is a bug or not but my question is, how do you guys >> scale gnocchi metricd in order to process a lot of resources and metrics? >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators