Re: [systemd-devel] forever loop during garbage collection

2015-02-03 Thread Lennart Poettering
On Wed, 10.12.14 15:22, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

Sorry for the late reply.

 On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:
 
  Hi,
 
  We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
  Problem happens when systemd runs in sysV compatibility mode (Porky
  enables this).
 
  Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
  like unit_gc_sweep cannot make a decision about the unit. As a result,
  it marks the unit with offset_unsure and adds the unit back to gc
  queue.
 
  If I am reading the code correctly recursive unit_gc_sweep will never
  be able to remove the unit from the gc queue if it is referenced by
  another unit and if another unit is referenced by the unit.
 
  A is referenced by B
  B is referenced by A
 
  So in this case first A will be processed by the GC sweep, it will
  follow the link to B while setting the state to IN_PATH and invoke the
  GC sweep on that. B will then be set to IN_PATH too. GC sweep now
  follows its link back, and up at A again, but this time return quickly
  because its state is set to IN_PATH. Due to this, it will then set B's
  state to UNSURE, and return to A, which in effect will now be set to
  UNSURE too. Now, we return into GC queue dispatch call, which will
  notice that it is UNSURE and uprgade that to BAD, and kill it because
  there's nothin in the unit's dependency network that is clearly a
  GOOD, and hence should be removed.
 
  The essence of cycle breaking here is really in
  manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
  am not seeing how this could end up in an endless loop hence.
 
 I have debugged it more and as you have said there is no bug in code
 but it takes so long to go out of unit_gc_sweep I thought there is a
 forever loop.
 
 Attached is my patch on 216 and
 https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing
 is a part of the log after patch.
 
 It has been 3 hours since I issued systemctl isolate and according
 to the logs I can see that garbage collection logic is making it's way
 back up. I guess it will eventually resolve itself but after so many
 hours.

Hmm, so, you mean the code works correctly but scales really badly?
How many units do you have?

 
 (Search for - - and it is happening every 300.000 lines)
 
 Problem seemed to be introduced on 95ed329 - Move handling of sysv
 initscripts to a generator.

Hmm, how precisely do the deps look like the generator creates for
you?

Any chance you can run
/usr/lib/systemd/system-generators/systemd-sysv-generator /tmp/foo
/tmp/foo /tmp/foo, and check what deps it precisely generates in
/tmp/foo?

I have never seen that the GC scales this badly...

 This is totally due to how sysV generator is linking services but I
 think slowness on GC can happen on a complex system with many units
 linked with each other.
 
 Thoughts?

I am puzzled, quite frankly...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] forever loop during garbage collection

2014-12-29 Thread Umut Tezduyar Lindskog
Ping?

On Wednesday, December 10, 2014, Umut Tezduyar Lindskog u...@tezduyar.com
wrote:

 On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering
 lenn...@poettering.net javascript:; wrote:
  On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com
 javascript:;) wrote:
 
  Hi,
 
  We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
  Problem happens when systemd runs in sysV compatibility mode (Porky
  enables this).
 
  Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
  like unit_gc_sweep cannot make a decision about the unit. As a result,
  it marks the unit with offset_unsure and adds the unit back to gc
  queue.
 
  If I am reading the code correctly recursive unit_gc_sweep will never
  be able to remove the unit from the gc queue if it is referenced by
  another unit and if another unit is referenced by the unit.
 
  A is referenced by B
  B is referenced by A
 
  So in this case first A will be processed by the GC sweep, it will
  follow the link to B while setting the state to IN_PATH and invoke the
  GC sweep on that. B will then be set to IN_PATH too. GC sweep now
  follows its link back, and up at A again, but this time return quickly
  because its state is set to IN_PATH. Due to this, it will then set B's
  state to UNSURE, and return to A, which in effect will now be set to
  UNSURE too. Now, we return into GC queue dispatch call, which will
  notice that it is UNSURE and uprgade that to BAD, and kill it because
  there's nothin in the unit's dependency network that is clearly a
  GOOD, and hence should be removed.
 
  The essence of cycle breaking here is really in
  manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
  am not seeing how this could end up in an endless loop hence.

 I have debugged it more and as you have said there is no bug in code
 but it takes so long to go out of unit_gc_sweep I thought there is a
 forever loop.

 Attached is my patch on 216 and

 https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing
 is a part of the log after patch.

 It has been 3 hours since I issued systemctl isolate and according
 to the logs I can see that garbage collection logic is making it's way
 back up. I guess it will eventually resolve itself but after so many
 hours.

 (Search for - - and it is happening every 300.000
 lines)

 Problem seemed to be introduced on 95ed329 - Move handling of sysv
 initscripts to a generator.

 This is totally due to how sysV generator is linking services but I
 think slowness on GC can happen on a complex system with many units
 linked with each other.

 Thoughts?
 Umut

 
 
  We have this circular referenced by dependency between units and I am
  quite sure they are due to sysV compatibility.
 
  I know that systemd does not allow circular dependency between units
  (ex, wants, or after) but do we allow circular referenced by
  dependency? If so, then it is expected that manager_dispatch_gc_queue
  gets stuck.
 
  We can reproduce it on 216/217 when we isolate a target.
 
  Note: Line
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
  should be before
 
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
  since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make
  a decision and we set it back to false.
 
  This is intended. After the sweep returned back to the anchor we can
  make our decision: either add the unit to the cleanup queue in which
  case it should removed from the GC queue, or it is reinstantated as
  a good unit that should continue to exist, in which case it should be
  removed from the GC queue too.
 
  Can't see a bug here...
 
  Can you elaborate on how precisely you are encountering the GC loop?
 
  Lennart
 
  --
  Lennart Poettering, Red Hat

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] forever loop during garbage collection

2014-12-10 Thread Umut Tezduyar Lindskog
On Mon, Dec 8, 2014 at 8:09 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

 Hi,

 We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
 Problem happens when systemd runs in sysV compatibility mode (Porky
 enables this).

 Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
 like unit_gc_sweep cannot make a decision about the unit. As a result,
 it marks the unit with offset_unsure and adds the unit back to gc
 queue.

 If I am reading the code correctly recursive unit_gc_sweep will never
 be able to remove the unit from the gc queue if it is referenced by
 another unit and if another unit is referenced by the unit.

 A is referenced by B
 B is referenced by A

 So in this case first A will be processed by the GC sweep, it will
 follow the link to B while setting the state to IN_PATH and invoke the
 GC sweep on that. B will then be set to IN_PATH too. GC sweep now
 follows its link back, and up at A again, but this time return quickly
 because its state is set to IN_PATH. Due to this, it will then set B's
 state to UNSURE, and return to A, which in effect will now be set to
 UNSURE too. Now, we return into GC queue dispatch call, which will
 notice that it is UNSURE and uprgade that to BAD, and kill it because
 there's nothin in the unit's dependency network that is clearly a
 GOOD, and hence should be removed.

 The essence of cycle breaking here is really in
 manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
 am not seeing how this could end up in an endless loop hence.

I have debugged it more and as you have said there is no bug in code
but it takes so long to go out of unit_gc_sweep I thought there is a
forever loop.

Attached is my patch on 216 and
https://drive.google.com/file/d/0B_uiALgWpGXtZ0VidURxSnVhcDA/view?usp=sharing
is a part of the log after patch.

It has been 3 hours since I issued systemctl isolate and according
to the logs I can see that garbage collection logic is making it's way
back up. I guess it will eventually resolve itself but after so many
hours.

(Search for - - and it is happening every 300.000 lines)

Problem seemed to be introduced on 95ed329 - Move handling of sysv
initscripts to a generator.

This is totally due to how sysV generator is linking services but I
think slowness on GC can happen on a complex system with many units
linked with each other.

Thoughts?
Umut



 We have this circular referenced by dependency between units and I am
 quite sure they are due to sysV compatibility.

 I know that systemd does not allow circular dependency between units
 (ex, wants, or after) but do we allow circular referenced by
 dependency? If so, then it is expected that manager_dispatch_gc_queue
 gets stuck.

 We can reproduce it on 216/217 when we isolate a target.

 Note: Line 
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
 should be before
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make
 a decision and we set it back to false.

 This is intended. After the sweep returned back to the anchor we can
 make our decision: either add the unit to the cleanup queue in which
 case it should removed from the GC queue, or it is reinstantated as
 a good unit that should continue to exist, in which case it should be
 removed from the GC queue too.

 Can't see a bug here...

 Can you elaborate on how precisely you are encountering the GC loop?

 Lennart

 --
 Lennart Poettering, Red Hat


0001-Debugging-gc_sweep 1.03.18 PM.patch
Description: Binary data
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] forever loop during garbage collection

2014-12-08 Thread Lennart Poettering
On Sun, 30.11.14 14:38, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

 Hi,
 
 We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
 Problem happens when systemd runs in sysV compatibility mode (Porky
 enables this).
 
 Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
 like unit_gc_sweep cannot make a decision about the unit. As a result,
 it marks the unit with offset_unsure and adds the unit back to gc
 queue.
 
 If I am reading the code correctly recursive unit_gc_sweep will never
 be able to remove the unit from the gc queue if it is referenced by
 another unit and if another unit is referenced by the unit.
 
 A is referenced by B
 B is referenced by A

So in this case first A will be processed by the GC sweep, it will
follow the link to B while setting the state to IN_PATH and invoke the
GC sweep on that. B will then be set to IN_PATH too. GC sweep now
follows its link back, and up at A again, but this time return quickly
because its state is set to IN_PATH. Due to this, it will then set B's
state to UNSURE, and return to A, which in effect will now be set to
UNSURE too. Now, we return into GC queue dispatch call, which will
notice that it is UNSURE and uprgade that to BAD, and kill it because
there's nothin in the unit's dependency network that is clearly a
GOOD, and hence should be removed.

The essence of cycle breaking here is really in
manager_dispatch_gc_queue() which uprgades UNSURE to BAD in the end. I
am not seeing how this could end up in an endless loop hence. 

 
 We have this circular referenced by dependency between units and I am
 quite sure they are due to sysV compatibility.
 
 I know that systemd does not allow circular dependency between units
 (ex, wants, or after) but do we allow circular referenced by
 dependency? If so, then it is expected that manager_dispatch_gc_queue
 gets stuck.
 
 We can reproduce it on 216/217 when we isolate a target.
 
 Note: Line 
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
 should be before
 http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
 since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make
 a decision and we set it back to false.

This is intended. After the sweep returned back to the anchor we can
make our decision: either add the unit to the cleanup queue in which
case it should removed from the GC queue, or it is reinstantated as
a good unit that should continue to exist, in which case it should be
removed from the GC queue too.

Can't see a bug here...

Can you elaborate on how precisely you are encountering the GC loop?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] forever loop during garbage collection

2014-11-30 Thread Umut Tezduyar Lindskog
Hi,

We are experiencing an unbreakable loop in manager_dispatch_gc_queue.
Problem happens when systemd runs in sysV compatibility mode (Porky
enables this).

Seems like manager_dispatch_gc_queue's while loop gets stuck and seems
like unit_gc_sweep cannot make a decision about the unit. As a result,
it marks the unit with offset_unsure and adds the unit back to gc
queue.

If I am reading the code correctly recursive unit_gc_sweep will never
be able to remove the unit from the gc queue if it is referenced by
another unit and if another unit is referenced by the unit.

A is referenced by B
B is referenced by A

We have this circular referenced by dependency between units and I am
quite sure they are due to sysV compatibility.

I know that systemd does not allow circular dependency between units
(ex, wants, or after) but do we allow circular referenced by
dependency? If so, then it is expected that manager_dispatch_gc_queue
gets stuck.

We can reproduce it on 216/217 when we isolate a target.

Note: Line 
http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n875
should be before
http://cgit.freedesktop.org/systemd/systemd/tree/src/core/manager.c?id=941a643569dc6b53d0b334276d2a3cc0ed159e88#n872
since unit_gc_sweep() sets the u-in_gc_queue = true if it cannot make
a decision and we set it back to false.

Umut
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel