Re: [systemd-devel] forever loop during garbage collection
On Wed, 10.12.14 15:22, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: Sorry for the late reply. On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering lenn...@poettering.net wrote: On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: Hi, We are experiencing an unbreakable loop in manager_dispatch_gc_queue. Problem happens when systemd runs in sysV compatibility mode (Porky enables this). Seems like manager_dispatch_gc_queue's while loop gets stuck and seems like unit_gc_sweep cannot make a decision about the unit. As a result, it marks the unit with offset_unsure and adds the unit back to gc queue. If I am reading the code correctly recursive unit_gc_sweep will never be able to remove the unit from the gc queue if it is referenced by another unit and if another unit is referenced by the unit. A is referenced by B B is referenced by A So in this case first A will be processed by the GC sweep, it will follow the link to B while setting the state to IN_PATH and invoke the GC sweep on that. B will then be set to IN_PATH too. GC sweep now follows its link back, and up at A again, but this time return quickly because its state is set to IN_PATH. Due to this, it will then set B's state to UNSURE, and return to A, which in effect will now be set to UNSURE too. Now, we return into GC queue dispatch call, which will notice that it is UNSURE and uprgade that to BAD, and kill it because there's nothin in the unit's dependency network that is clearly a GOOD, and hence should be removed. The essence of cycle breaking here is really in manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I am not seeing how this could end up in an endless loop hence. I have debugged it more and as you have said there is no bug in code but it takes so long to go out of unit_gc_sweep I thought there is a forever loop. Attached is my patch on 216 and https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing is a part of the log after patch. It has been 3 hours since I issued systemctl isolate and according to the logs I can see that garbage collection logic is making it's way back up. I guess it will eventually resolve itself but after so many hours. Hmm, so, you mean the code works correctly but scales really badly? How many units do you have? (Search for - - and it is happening every 300.000 lines) Problem seemed to be introduced on 95ed329 - Move handling of sysv initscripts to a generator. Hmm, how precisely do the deps look like the generator creates for you? Any chance you can run /usr/lib/systemd/system-generators/systemd-sysv-generator /tmp/foo /tmp/foo /tmp/foo, and check what deps it precisely generates in /tmp/foo? I have never seen that the GC scales this badly... This is totally due to how sysV generator is linking services but I think slowness on GC can happen on a complex system with many units linked with each other. Thoughts? I am puzzled, quite frankly... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] forever loop during garbage collection
Ping? On Wednesday, December 10, 2014, Umut Tezduyar Lindskog u...@tezduyar.com wrote: On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering lenn...@poettering.net javascript:; wrote: On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com javascript:;) wrote: Hi, We are experiencing an unbreakable loop in manager_dispatch_gc_queue. Problem happens when systemd runs in sysV compatibility mode (Porky enables this). Seems like manager_dispatch_gc_queue's while loop gets stuck and seems like unit_gc_sweep cannot make a decision about the unit. As a result, it marks the unit with offset_unsure and adds the unit back to gc queue. If I am reading the code correctly recursive unit_gc_sweep will never be able to remove the unit from the gc queue if it is referenced by another unit and if another unit is referenced by the unit. A is referenced by B B is referenced by A So in this case first A will be processed by the GC sweep, it will follow the link to B while setting the state to IN_PATH and invoke the GC sweep on that. B will then be set to IN_PATH too. GC sweep now follows its link back, and up at A again, but this time return quickly because its state is set to IN_PATH. Due to this, it will then set B's state to UNSURE, and return to A, which in effect will now be set to UNSURE too. Now, we return into GC queue dispatch call, which will notice that it is UNSURE and uprgade that to BAD, and kill it because there's nothin in the unit's dependency network that is clearly a GOOD, and hence should be removed. The essence of cycle breaking here is really in manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I am not seeing how this could end up in an endless loop hence. I have debugged it more and as you have said there is no bug in code but it takes so long to go out of unit_gc_sweep I thought there is a forever loop. Attached is my patch on 216 and https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing is a part of the log after patch. It has been 3 hours since I issued systemctl isolate and according to the logs I can see that garbage collection logic is making it's way back up. I guess it will eventually resolve itself but after so many hours. (Search for - - and it is happening every 300.000 lines) Problem seemed to be introduced on 95ed329 - Move handling of sysv initscripts to a generator. This is totally due to how sysV generator is linking services but I think slowness on GC can happen on a complex system with many units linked with each other. Thoughts? Umut We have this circular referenced by dependency between units and I am quite sure they are due to sysV compatibility. I know that systemd does not allow circular dependency between units (ex, wants, or after) but do we allow circular referenced by dependency? If so, then it is expected that manager_dispatch_gc_queue gets stuck. We can reproduce it on 216/217 when we isolate a target. Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875 should be before http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make a decision and we set it back to false. This is intended. After the sweep returned back to the anchor we can make our decision: either add the unit to the cleanup queue in which case it should removed from the GC queue, or it is reinstantated as a good unit that should continue to exist, in which case it should be removed from the GC queue too. Can't see a bug here... Can you elaborate on how precisely you are encountering the GC loop? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] forever loop during garbage collection
On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering lenn...@poettering.net wrote: On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: Hi, We are experiencing an unbreakable loop in manager_dispatch_gc_queue. Problem happens when systemd runs in sysV compatibility mode (Porky enables this). Seems like manager_dispatch_gc_queue's while loop gets stuck and seems like unit_gc_sweep cannot make a decision about the unit. As a result, it marks the unit with offset_unsure and adds the unit back to gc queue. If I am reading the code correctly recursive unit_gc_sweep will never be able to remove the unit from the gc queue if it is referenced by another unit and if another unit is referenced by the unit. A is referenced by B B is referenced by A So in this case first A will be processed by the GC sweep, it will follow the link to B while setting the state to IN_PATH and invoke the GC sweep on that. B will then be set to IN_PATH too. GC sweep now follows its link back, and up at A again, but this time return quickly because its state is set to IN_PATH. Due to this, it will then set B's state to UNSURE, and return to A, which in effect will now be set to UNSURE too. Now, we return into GC queue dispatch call, which will notice that it is UNSURE and uprgade that to BAD, and kill it because there's nothin in the unit's dependency network that is clearly a GOOD, and hence should be removed. The essence of cycle breaking here is really in manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I am not seeing how this could end up in an endless loop hence. I have debugged it more and as you have said there is no bug in code but it takes so long to go out of unit_gc_sweep I thought there is a forever loop. Attached is my patch on 216 and https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing is a part of the log after patch. It has been 3 hours since I issued systemctl isolate and according to the logs I can see that garbage collection logic is making it's way back up. I guess it will eventually resolve itself but after so many hours. (Search for - - and it is happening every 300.000 lines) Problem seemed to be introduced on 95ed329 - Move handling of sysv initscripts to a generator. This is totally due to how sysV generator is linking services but I think slowness on GC can happen on a complex system with many units linked with each other. Thoughts? Umut We have this circular referenced by dependency between units and I am quite sure they are due to sysV compatibility. I know that systemd does not allow circular dependency between units (ex, wants, or after) but do we allow circular referenced by dependency? If so, then it is expected that manager_dispatch_gc_queue gets stuck. We can reproduce it on 216/217 when we isolate a target. Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875 should be before http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make a decision and we set it back to false. This is intended. After the sweep returned back to the anchor we can make our decision: either add the unit to the cleanup queue in which case it should removed from the GC queue, or it is reinstantated as a good unit that should continue to exist, in which case it should be removed from the GC queue too. Can't see a bug here... Can you elaborate on how precisely you are encountering the GC loop? Lennart -- Lennart Poettering, Red Hat 0001-Debugging-gc_sweep 1.03.18 PM.patch Description: Binary data ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] forever loop during garbage collection
On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote: Hi, We are experiencing an unbreakable loop in manager_dispatch_gc_queue. Problem happens when systemd runs in sysV compatibility mode (Porky enables this). Seems like manager_dispatch_gc_queue's while loop gets stuck and seems like unit_gc_sweep cannot make a decision about the unit. As a result, it marks the unit with offset_unsure and adds the unit back to gc queue. If I am reading the code correctly recursive unit_gc_sweep will never be able to remove the unit from the gc queue if it is referenced by another unit and if another unit is referenced by the unit. A is referenced by B B is referenced by A So in this case first A will be processed by the GC sweep, it will follow the link to B while setting the state to IN_PATH and invoke the GC sweep on that. B will then be set to IN_PATH too. GC sweep now follows its link back, and up at A again, but this time return quickly because its state is set to IN_PATH. Due to this, it will then set B's state to UNSURE, and return to A, which in effect will now be set to UNSURE too. Now, we return into GC queue dispatch call, which will notice that it is UNSURE and uprgade that to BAD, and kill it because there's nothin in the unit's dependency network that is clearly a GOOD, and hence should be removed. The essence of cycle breaking here is really in manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I am not seeing how this could end up in an endless loop hence. We have this circular referenced by dependency between units and I am quite sure they are due to sysV compatibility. I know that systemd does not allow circular dependency between units (ex, wants, or after) but do we allow circular referenced by dependency? If so, then it is expected that manager_dispatch_gc_queue gets stuck. We can reproduce it on 216/217 when we isolate a target. Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875 should be before http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make a decision and we set it back to false. This is intended. After the sweep returned back to the anchor we can make our decision: either add the unit to the cleanup queue in which case it should removed from the GC queue, or it is reinstantated as a good unit that should continue to exist, in which case it should be removed from the GC queue too. Can't see a bug here... Can you elaborate on how precisely you are encountering the GC loop? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] forever loop during garbage collection
Hi, We are experiencing an unbreakable loop in manager_dispatch_gc_queue. Problem happens when systemd runs in sysV compatibility mode (Porky enables this). Seems like manager_dispatch_gc_queue's while loop gets stuck and seems like unit_gc_sweep cannot make a decision about the unit. As a result, it marks the unit with offset_unsure and adds the unit back to gc queue. If I am reading the code correctly recursive unit_gc_sweep will never be able to remove the unit from the gc queue if it is referenced by another unit and if another unit is referenced by the unit. A is referenced by B B is referenced by A We have this circular referenced by dependency between units and I am quite sure they are due to sysV compatibility. I know that systemd does not allow circular dependency between units (ex, wants, or after) but do we allow circular referenced by dependency? If so, then it is expected that manager_dispatch_gc_queue gets stuck. We can reproduce it on 216/217 when we isolate a target. Note: Line http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875 should be before http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make a decision and we set it back to false. Umut ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel