Hi Mikhail, Thank you for the extra details. I'll continue to look into this.
With the daily bumps when you do the log rotation, I assume you aren't reloading zuul at that point and the freed memory is likely due to another process? Cheers, Josh On Tue, Mar 8, 2016 at 10:17 AM, Mikhail Medvedev <[email protected]> wrote: > On Wed, Feb 10, 2016 at 10:57 AM, James E. Blair <[email protected]> > wrote: > > Michael Still <[email protected]> writes: > > > >> On Tue, Feb 9, 2016 at 4:59 AM, Joshua Hesketh < > [email protected]> > >> wrote: > >> > >>> On Thu, Feb 4, 2016 at 2:44 AM, James E. Blair <[email protected]> > >>> wrote: > >>>> > >>>> On the subject of clearing the cache more often, I think we may not > want > >>>> to wipe out the cache more often than we do now -- in fact, I think we > >>>> may want to look into ways to keep from doing even that, because > >>>> whenever we reload now, Zuul slows down considerably as it has to > query > >>>> Gerrit again for all of the data previously in its cache. > >>>> > >>> > >>> I can see a lot of 3rd parties or simpler CI's not needing to reload > zuul > >>> very often so this cache would never get cleared. Perhaps cached > objects > >>> should have an expiry time (of a day or so) and can be cleaned up > >>> periodically? Additionally if clearing the cache on a reload is causing > >>> pain maybe we should move the cache into the scheduler and keep it > between > >>> reloads? > >>> > >> > >> Do you guys use oslo at all? I ask because the olso memcache stuff does > >> exactly this, so it should be trivial to implement if you don't mind > >> depending on oslo. > > > > One of the main things we use the cache for is to ensure that every > > change is represented by a single Change object in Zuul's memory. The > > graph of enqueued Items link to their respective Changes which may link > > to each other due to dependencies. When something changes in Gerrit, we > > want that reflected immediately and consistently in all of the objects > > in that graph. Using the cache means that every time we add a new > > Change object to that graph, we use the same object for a given change. > > > > This is why we can't use time-based expiry -- we must not drop objects > > from the cache if they are still in the graph. Otherwise we will create > > new duplicative objects and the ones still in the graph will not be > > updated. > > > > Perhaps we should change these objects to something more ephemeral that > > can proxy for some other mechanism that can operate more like a > > traditional cache (with time-based expiry). But I think changes to this > > system should happen in Zuulv3 -- it works well enough for Zuulv2 for > > now. > > > > -Jim > > > > We are one of third-party CIs and using "Zuul version: 2.1.1.dev123", > which is one commit after [1]. That one commit after is not in tree - I am > applying [2] on top. > > The VM has 8GB of RAM. zuul-server memory footprint goes up consistently > over > the course of a week. Normally it takes about 3-4 days to get over to 3Gb. > About a week ago I witnessed zuul-server get to 95% of RAM, at which point > kernel started killing other processes. The graph [3] memory [3], and it > reflects zuul-server consumption. The daily bumps on the graph are daily > cron > doing log rotation etc, possibly flushing caches. > > I can not say 100% that it is still the leak. Could simply be that > zuul-server > requires more ram now. > > [1] > https://review.openstack.org/#q,I81ee47524cda71a500c55a95a2280f491b1b63d9,n,z > [2] > https://review.openstack.org/#q,If3a418fa2d4993a149d454e02a9b26529e4b6825,n,z > [3] http://imgur.com/SzqSA1H > > Mikhail Medvedev (mmedvede) > > _______________________________________________ > OpenStack-Infra mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra >
_______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
