Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
David Herrmann wrote: > Krzysztof Kotlenga wrote: >> David Herrmann wrote: >>> Krzysztof Kotlenga wrote: >> >> Some more or less minor points still stand though: >> >> - errored/disabled bus-inputs are never freed, even after client >> disconnect (!) > > What's a "bus-input"? sd_bus.input_io_event_source sd_event_sources have a description set to "bus-input" to aid debugging: https://github.com/systemd/systemd/blob/v226/src/libsystemd/sd-bus/sd-bus.c#L3180 For example this return path: https://github.com/systemd/systemd/blob/v226/src/libsystemd/sd-bus/sd-bus.c#L2782 It's easy to "exercise" it with netcat (generate -EIO there), as I wrote before. Such sd_event_source gets disabled but it doesn't go anywhere, even after client disconnect. -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Hi On Sat, Sep 26, 2015 at 7:56 PM, Krzysztof Kotlenga wrote: > David Herrmann wrote: > >> Krzysztof Kotlenga wrote: >>> Krzysztof Kotlenga wrote: >>> 2. Maybe I don't get something right - please enlighten me if that's the case - but prioq_put or rather shuffle_up function is buggy. (...) Am I wrong to believe that it should have swapped 2 and 1, not 2 and 0? >>> >>> OK, I've finally realised that I'm a moron iterating over a >>> heap-like structure... >> >> For the archives: Is this fixed with the prioq-compare fix? > > The point quoted above was bogus, but otherwise yes - the main issue > here is fixed with the said fix. > > Some more or less minor points still stand though: > > - errored/disabled bus-inputs are never freed, even after client > disconnect (!) What's a "bus-input"? > - data pointer comparison instead of index comparison for prioq > stability. Is this _really_ correct? Anyway, I made a pull > request... -> https://github.com/systemd/systemd/pull/1393 Thanks David ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
David Herrmann wrote: > Krzysztof Kotlenga wrote: >> Krzysztof Kotlenga wrote: >> >>> 2. Maybe I don't get something right - please enlighten me if >>> that's the case - but prioq_put or rather shuffle_up function is >>> buggy. >>> >>> (...) >>> >>> Am I wrong to believe that it should have swapped 2 and 1, not 2 >>> and 0? >> >> OK, I've finally realised that I'm a moron iterating over a >> heap-like structure... > > For the archives: Is this fixed with the prioq-compare fix? The point quoted above was bogus, but otherwise yes - the main issue here is fixed with the said fix. Some more or less minor points still stand though: - errored/disabled bus-inputs are never freed, even after client disconnect (!) - data pointer comparison instead of index comparison for prioq stability. Is this _really_ correct? Anyway, I made a pull request... -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Hey On Thu, Sep 24, 2015 at 1:15 AM, Krzysztof Kotlenga wrote: > Krzysztof Kotlenga wrote: > >> 2. Maybe I don't get something right - please enlighten me if that's >> the case - but prioq_put or rather shuffle_up function is buggy. >> >> (...) >> >> Am I wrong to believe that it should have swapped 2 and 1, not 2 and >> 0? > > OK, I've finally realised that I'm a moron iterating over a heap-like > structure... For the archives: Is this fixed with the prioq-compare fix? Thanks David ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Krzysztof Kotlenga wrote: > 2. Maybe I don't get something right - please enlighten me if that's > the case - but prioq_put or rather shuffle_up function is buggy. > > (...) > > Am I wrong to believe that it should have swapped 2 and 1, not 2 and > 0? OK, I've finally realised that I'm a moron iterating over a heap-like structure... -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Krzysztof Kotlenga wrote: > Two things: > (...) One more. Prioq compare functions in sd-event.c contain this: /* Stability for the rest */ if (x < y) return -1; if (x > y) return 1; How comparing prioq_item.data pointers is going to ensure stability? Shouldn't it be: if (x->prepare_index < y->prepare_index) ... and so on? -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Krzysztof Kotlenga wrote: > I recommend the following gdb script, because it provides some > possibly interesting output: Slightly fixed version: https://gist.github.com/pocek/709d8c098d632ed1f21e The first one was incorrectly printing pending_iteration instead of prepare_iteration field. Nothing major. > What's going on with order in that "prepare" Prioq? I've debugged this a bit more. Two things: 1. I think prepare_prioq_compare function is wrong: "enabled" field should be the first one to compare. 2. Maybe I don't get something right - please enlighten me if that's the case - but prioq_put or rather shuffle_up function is buggy. Let's start with prepare Prioq like this: bus-input e=1 pri=0 pi=8 fd=12 bus-input e=1 pri=0 pi=9 fd=18 <- netcat Now we enter some garbage in netcat: bus-input e=1 pri=0 pi=9 fd=12 bus-input e=0 pri=0 pi=9 fd=18 <- netcat BTW, some loop iterations later it is still OK if we have fixed the order in prepare_prioq_compare (see 1): bus-input e=1 pri=0 pi=10 fd=12 bus-input e=0 pri=0 pi=9 fd=18 <- netcat Good! Now let's make another connection. We have a breakpoint in prepare_prioq_compare. bus-input e=1 pri=0 pi=11 fd=12 bus-input e=0 pri=0 pi=9 fd=18 (null) e=1 pri=0 pi=0 fd=19 <- new connection, it doesn't have a description set yet. Backtrace for some context (shortened): #0 prepare_prioq_compare (a=0x81466e18, b=0x81477490) #1 shuffle_up (q=0x81466ee8, idx=2) #2 prioq_put (q=0x81466ee8, data=0x81477490, idx=0x814774b8) #3 sd_event_source_set_prepare (s=0x81477490, callback=0x80181f81 ) #4 attach_io_events (bus=0x81476d20) #5 sd_bus_attach_event (bus=0x81476d20, event=0x8140e6e0, priority=0) #6 bus_on_connection (s=0x81471680, fd=14, revents=1, userdata=0x8140e318) #7 source_dispatch (s=0x81471680) #8 sd_event_dispatch (e=0x8140e6e0) #9 sd_event_run (e=0x8140e6e0, timeout=18446744073709551615) #10 manager_loop (m=0x8140e318) #11 main (argc=4, argv=0xbfb1b254) OK, let's see what we have got here: (gdb) p x->prepare_index $2 = 0 (gdb) p y->prepare_index $3 = 2 So we are comparing the new sd_event_source with the first one in the queue. This is bad. Let's move to shuffle_up first. (gdb) frame 1 (gdb) p k $4 = 0 Right. So shuffle_up doesn't iterate more than once here. It has only swapped item 0 and 2. The prepare queue looks like this now: (null) e=1 pri=0 pi=0 fd=19 bus-input e=0 pri=0 pi=9 fd=18 bus-input e=1 pri=0 pi=11 fd=12 It doesn't look like a correct order judging by how prepare_event (sd-event.c) works. The last bus-input above is pretty much broken because s->prepare callback won't be called anymore. Am I wrong to believe that it should have swapped 2 and 1, not 2 and 0? Lennart, can you fix Prioq or correct me if I'm wrong, please? > This has been debugged on v225, but AFAICS master has the same issues. I'm debugging this now on v226 now. -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)
Hi. There were many reports of "Looping too fast. Throttling execution a little" behavior but they never got anywhere. Not this time. At least for this particular case. Initially I was able to reproduce this issue just by calling `systemctl restart some_specific_service` - then most of further `systemctl` actions (except status) caused systemd to enter an infinite event loop. Long story short, I found out that entering some garbage on /run/systemd/private with netcat breaks systemd equally well. `systemctl daemon-reexec` fixes it. But before you decide to break yours PID 1 event loop, I recommend the following gdb script, because it provides some possibly interesting output: #v+ b event_prepare commands 1 set $size = e->prepare->n_items set $enabled_after_disabled = 0 set $i = 0 while $i < $size set $s = (sd_event_source*)e->prepare->items[$i].data if $s->type == SOURCE_IO printf "%s e=%d pri=%d pi=%u fd=%d\n", \ $s->description, $s->enabled, $s->priority, \ $s->pending_iteration, $s->io.fd else printf "%s e=%d\n", $s->description, $s->enabled end if $i++ > 0 if $s->enabled == SD_EVENT_ON && $previous->enabled == SD_EVENT_OFF set $enabled_after_disabled = 1 end end set $previous = $s end if $enabled_after_disabled print "XXX shouldn't happen?" else continue end end #v- Now let's follow this. Output shortened for brevity. 1. Attach to systemd and let it run. You will most likely see some event source, not of particular interest, like this: bus-input e=1 pri=0 pi=0 fd=12 2. All is good. Run `nc -U /run/systemd/private` bus-input e=1 pri=0 pi=0 fd=18 bus-input e=1 pri=0 pi=0 fd=12 3. We are here too. Enter a newline. bus-input e=1 pri=0 pi=0 fd=12 bus-input e=0 pri=0 pi=15 fd=18 4. This bus-input event source got disabled by source_dispatch, because bus_socket_auth_verify_server returned -EIO. Now wait, hit Ctrl+C, generate some events with systemctl - whatever. bus-input e=0 pri=0 pi=15 fd=18 bus-input e=1 pri=0 pi=0 fd=12 $1 = "XXX shouldn't happen?" 5. The event loop is broken now. It haven't entered an infinite loop *yet* but calls like `systemctl restart whatever` will cause it. The reason is that prepare callbacks in event_prepare (sd-event.c) aren't called anymore - the for loop there breaks on the first disabled source. The prepare callback of the bus that systemctl talks on (most importantly sd_bus_get_events) is called only once and sets POLLIN|POLLOUT flags. That POLLOUT is left there as long as the client is on the bus and causes an infinite event loop. When we exit that hanging systemctl, systemd quits looping but the event loop is still broken. That disabled bus-input from point 4 is left there forever and prevents prepare callbacks of the enabled event sources to run. So, how many bugs are here? It seems that there are other errors than can happen in sd-bus / bus-socket, which bus_process_internal (sd-bus.c) doesn't handle and leaves the disabled event source sitting there forever. Those disabled event sources are: - a memory leak? - very good at breaking event_prepare (sd-event.c). What's going on with order in that "prepare" Prioq? This has been debugged on v225, but AFAICS master has the same issues. -- kjk ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel