On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering <lenn...@poettering.net> wrote: > On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: > >> Hi, >> >> We are experiencing an unbreakable loop in manager_dispatch_gc_queue. >> Problem happens when systemd runs in sysV compatibility mode (Porky >> enables this). >> >> Seems like manager_dispatch_gc_queue's while loop gets stuck and seems >> like unit_gc_sweep cannot make a decision about the unit. As a result, >> it marks the unit with offset_unsure and adds the unit back to gc >> queue. >> >> If I am reading the code correctly recursive unit_gc_sweep will never >> be able to remove the unit from the gc queue if it is referenced by >> another unit and if another unit is referenced by the unit. >> >> A is referenced by B >> B is referenced by A > > So in this case first A will be processed by the GC sweep, it will > follow the link to B while setting the state to IN_PATH and invoke the > GC sweep on that. B will then be set to IN_PATH too. GC sweep now > follows its link back, and up at A again, but this time return quickly > because its state is set to IN_PATH. Due to this, it will then set B's > state to UNSURE, and return to A, which in effect will now be set to > UNSURE too. Now, we return into GC queue dispatch call, which will > notice that it is UNSURE and uprgade that to BAD, and kill it because > there's nothin in the unit's dependency network that is clearly a > GOOD, and hence should be removed. > > The essence of cycle breaking here is really in > manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I > am not seeing how this could end up in an endless loop hence.
I have debugged it more and as you have said there is no bug in code but it takes so long to go out of unit_gc_sweep I thought there is a forever loop. Attached is my patch on 216 and https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing is a part of the log after patch. It has been 3 hours since I issued "systemctl isolate" and according to the logs I can see that garbage collection logic is making it's way back up. I guess it will eventually resolve itself but after so many hours. (Search for "- -" and it is happening every 300.000 lines) Problem seemed to be introduced on "95ed329" - Move handling of sysv initscripts to a generator. This is totally due to how sysV generator is linking services but I think slowness on GC can happen on a complex system with many units linked with each other. Thoughts? Umut > >> >> We have this circular referenced by dependency between units and I am >> quite sure they are due to sysV compatibility. >> >> I know that systemd does not allow circular dependency between units >> (ex, wants, or after) but do we allow circular referenced by >> dependency? If so, then it is expected that manager_dispatch_gc_queue >> gets stuck. >> >> We can reproduce it on 216/217 when we isolate a target. >> >> Note: Line >> http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875 >> should be before >> http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872 >> since unit_gc_sweep() sets the u->in_gc_queue = true if it cannot make >> a decision and we set it back to false. > > This is intended. After the sweep returned back to the anchor we can > make our decision: either add the unit to the cleanup queue in which > case it should removed from the GC queue, or it is reinstantated as > a good unit that should continue to exist, in which case it should be > removed from the GC queue too. > > Can't see a bug here... > > Can you elaborate on how precisely you are encountering the GC loop? > > Lennart > > -- > Lennart Poettering, Red Hat
0001-Debugging-gc_sweep 1.03.18 PM.patch
Description: Binary data
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel