I recently got into trouble with a large backlog. What I found was at some 
point the backlog got too large for gnocchi to effectivly function.  When using 
ceph list of metric objects is kept in a omap object which normally is a quick 
and efficient way to store this list.  However, at some point the list grows 
too large for it to be managed by the leveldb which implements the omap k/v 
store.  I finally moved to some ssd’s to get enough iops for leveldb/omap to 
function.  What I’m guessing is that if you are using ceph the increased number 
of metrics grabbed per pass reduced the number of times a now expensive 
operation is performed.  Indications are that the new bluestore should make 
omap scale much better but isn’t slated to go stable for a few months with the 
release of Luminous.


> On Mar 28, 2017, at 2:28 PM, Ionut Biru - Fleio <io...@fleio.com> wrote:
> 
> Hello,
> 
> I do have a cloud under administration, my setup is fairly basic, I have 
> deployed openstack using Openstack Ansible, currently I'm a Newton and 
> planning to upgrade on Ocata.
> 
> I'm having a problem with gnocchi metricd falling behind on processing 
> metrics.
> 
> Gnocchi config: https://paste.xinu.at/f73A/ <https://paste.xinu.at/f73A/>
> 
> In I'm using default workers number(cpu count) the number of "storage/total 
> number of measures to process" keeps growing, last time I had 300k in queue. 
> In seems that the tasks are not rescheduled in order to process them all in 
> time and it processing couples of metrics after they are received from 
> ceilometer and after that they are kept in queue and I only have 10 compute 
> nodes with about 70 instances.
> 
> In order to process I had to set up workers to a very high number (100) and 
> keep restarting metricd in order for them to be processed but this method is 
> very cpu and memory intensive and luckily I found another method that works 
> quite well.
> 
> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154
>  
> <https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154>
> 
> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd keeps 
> processing them.
> 
> I'm not sure yet if is a bug or not but my question is, how do you guys scale 
> gnocchi metricd in order to process a lot of resources and metrics?
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org 
> <mailto:OpenStack-operators@lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to