On Wed, 10.12.14 15:22, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: Sorry for the late reply.
> On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering > <lenn...@poettering.net> wrote: > > On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: > > > >> Hi, > >> > >> We are experiencing an unbreakable loop in manager_dispatch_gc_queue. > >> Problem happens when systemd runs in sysV compatibility mode (Porky > >> enables this). > >> > >> Seems like manager_dispatch_gc_queue's while loop gets stuck and seems > >> like unit_gc_sweep cannot make a decision about the unit. As a result, > >> it marks the unit with offset_unsure and adds the unit back to gc > >> queue. > >> > >> If I am reading the code correctly recursive unit_gc_sweep will never > >> be able to remove the unit from the gc queue if it is referenced by > >> another unit and if another unit is referenced by the unit. > >> > >> A is referenced by B > >> B is referenced by A > > > > So in this case first A will be processed by the GC sweep, it will > > follow the link to B while setting the state to IN_PATH and invoke the > > GC sweep on that. B will then be set to IN_PATH too. GC sweep now > > follows its link back, and up at A again, but this time return quickly > > because its state is set to IN_PATH. Due to this, it will then set B's > > state to UNSURE, and return to A, which in effect will now be set to > > UNSURE too. Now, we return into GC queue dispatch call, which will > > notice that it is UNSURE and uprgade that to BAD, and kill it because > > there's nothin in the unit's dependency network that is clearly a > > GOOD, and hence should be removed. > > > > The essence of cycle breaking here is really in > > manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I > > am not seeing how this could end up in an endless loop hence. > > I have debugged it more and as you have said there is no bug in code > but it takes so long to go out of unit_gc_sweep I thought there is a > forever loop. > > Attached is my patch on 216 and > https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing > is a part of the log after patch. > > It has been 3 hours since I issued "systemctl isolate" and according > to the logs I can see that garbage collection logic is making it's way > back up. I guess it will eventually resolve itself but after so many > hours. Hmm, so, you mean the code works correctly but scales really badly? How many units do you have? > > (Search for "- -" and it is happening every 300.000 lines) > > Problem seemed to be introduced on "95ed329" - Move handling of sysv > initscripts to a generator. Hmm, how precisely do the deps look like the generator creates for you? Any chance you can run "/usr/lib/systemd/system-generators/systemd-sysv-generator /tmp/foo /tmp/foo /tmp/foo", and check what deps it precisely generates in /tmp/foo? I have never seen that the GC scales this badly... > This is totally due to how sysV generator is linking services but I > think slowness on GC can happen on a complex system with many units > linked with each other. > > Thoughts? I am puzzled, quite frankly... Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel