Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-29 Thread Krzysztof Kotlenga
David Herrmann wrote:

> Krzysztof Kotlenga wrote:
>> David Herrmann wrote:
>>> Krzysztof Kotlenga wrote:
>>
>> Some more or less minor points still stand though:
>>
>> - errored/disabled bus-inputs are never freed, even after client
>>   disconnect (!)
> 
> What's a "bus-input"?

sd_bus.input_io_event_source sd_event_sources have a description set to
"bus-input" to aid debugging:
https://github.com/systemd/systemd/blob/v226/src/libsystemd/sd-bus/sd-bus.c#L3180

For example this return path:
https://github.com/systemd/systemd/blob/v226/src/libsystemd/sd-bus/sd-bus.c#L2782

It's easy to "exercise" it with netcat (generate -EIO there), as I
wrote before. Such sd_event_source gets disabled but it doesn't go
anywhere, even after client disconnect.

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-29 Thread David Herrmann
Hi

On Sat, Sep 26, 2015 at 7:56 PM, Krzysztof Kotlenga  wrote:
> David Herrmann wrote:
>
>> Krzysztof Kotlenga wrote:
>>> Krzysztof Kotlenga wrote:
>>>
 2. Maybe I don't get something right - please enlighten me if
 that's the case - but prioq_put or rather shuffle_up function is
 buggy.

 (...)

 Am I wrong to believe that it should have swapped 2 and 1, not 2
 and 0?
>>>
>>> OK, I've finally realised that I'm a moron iterating over a
>>> heap-like structure...
>>
>> For the archives: Is this fixed with the prioq-compare fix?
>
> The point quoted above was bogus, but otherwise yes - the main issue
> here is fixed with the said fix.
>
> Some more or less minor points still stand though:
>
> - errored/disabled bus-inputs are never freed, even after client
>   disconnect (!)

What's a "bus-input"?

> - data pointer comparison instead of index comparison for prioq
>   stability. Is this _really_ correct? Anyway, I made a pull
>   request...

-> https://github.com/systemd/systemd/pull/1393

Thanks
David
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-26 Thread Krzysztof Kotlenga
David Herrmann wrote:

> Krzysztof Kotlenga wrote:
>> Krzysztof Kotlenga wrote:
>>
>>> 2. Maybe I don't get something right - please enlighten me if
>>> that's the case - but prioq_put or rather shuffle_up function is
>>> buggy.
>>>
>>> (...)
>>>
>>> Am I wrong to believe that it should have swapped 2 and 1, not 2
>>> and 0?
>>
>> OK, I've finally realised that I'm a moron iterating over a
>> heap-like structure...
> 
> For the archives: Is this fixed with the prioq-compare fix?

The point quoted above was bogus, but otherwise yes - the main issue
here is fixed with the said fix.

Some more or less minor points still stand though:

- errored/disabled bus-inputs are never freed, even after client
  disconnect (!)

- data pointer comparison instead of index comparison for prioq
  stability. Is this _really_ correct? Anyway, I made a pull
  request...

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-26 Thread David Herrmann
Hey

On Thu, Sep 24, 2015 at 1:15 AM, Krzysztof Kotlenga  wrote:
> Krzysztof Kotlenga wrote:
>
>> 2. Maybe I don't get something right - please enlighten me if that's
>> the case - but prioq_put or rather shuffle_up function is buggy.
>>
>> (...)
>>
>> Am I wrong to believe that it should have swapped 2 and 1, not 2 and
>> 0?
>
> OK, I've finally realised that I'm a moron iterating over a heap-like
> structure...

For the archives: Is this fixed with the prioq-compare fix?

Thanks
David
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-23 Thread Krzysztof Kotlenga
Krzysztof Kotlenga wrote:

> 2. Maybe I don't get something right - please enlighten me if that's
> the case - but prioq_put or rather shuffle_up function is buggy. 
> 
> (...)
> 
> Am I wrong to believe that it should have swapped 2 and 1, not 2 and
> 0?

OK, I've finally realised that I'm a moron iterating over a heap-like
structure...

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-23 Thread Krzysztof Kotlenga
Krzysztof Kotlenga wrote:

> Two things:
> (...)

One more. Prioq compare functions in sd-event.c contain this:

/* Stability for the rest */
if (x < y)
return -1;
if (x > y)
return 1;


How comparing prioq_item.data pointers is going to ensure stability?
Shouldn't it be:

if (x->prepare_index < y->prepare_index)
...

and so on?

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-23 Thread Krzysztof Kotlenga
Krzysztof Kotlenga wrote:

> I recommend the following gdb script, because it provides some
> possibly interesting output:

Slightly fixed version:
https://gist.github.com/pocek/709d8c098d632ed1f21e

The first one was incorrectly printing pending_iteration instead of
prepare_iteration field. Nothing major.

> What's going on with order in that "prepare" Prioq?

I've debugged this a bit more. Two things:

1. I think prepare_prioq_compare function is wrong: "enabled" field
should be the first one to compare.

2. Maybe I don't get something right - please enlighten me if that's the
case - but prioq_put or rather shuffle_up function is buggy. 

Let's start with prepare Prioq like this:
bus-input e=1 pri=0 pi=8 fd=12
bus-input e=1 pri=0 pi=9 fd=18 <- netcat

Now we enter some garbage in netcat:
bus-input e=1 pri=0 pi=9 fd=12
bus-input e=0 pri=0 pi=9 fd=18 <- netcat

BTW, some loop iterations later it is still OK if we have fixed the
order in prepare_prioq_compare (see 1):

bus-input e=1 pri=0 pi=10 fd=12
bus-input e=0 pri=0 pi=9 fd=18 <- netcat

Good! Now let's make another connection. We have a breakpoint in
prepare_prioq_compare.

bus-input e=1 pri=0 pi=11 fd=12
bus-input e=0 pri=0 pi=9 fd=18
(null) e=1 pri=0 pi=0 fd=19 <- new connection, it doesn't have a
   description set yet.

Backtrace for some context (shortened):
#0  prepare_prioq_compare (a=0x81466e18, b=0x81477490) 
#1  shuffle_up (q=0x81466ee8, idx=2)
#2  prioq_put (q=0x81466ee8, data=0x81477490, idx=0x814774b8)
#3  sd_event_source_set_prepare (s=0x81477490, callback=0x80181f81 
)
#4  attach_io_events (bus=0x81476d20)
#5  sd_bus_attach_event (bus=0x81476d20, event=0x8140e6e0, priority=0)
#6  bus_on_connection (s=0x81471680, fd=14, revents=1, userdata=0x8140e318)
#7  source_dispatch (s=0x81471680)
#8  sd_event_dispatch (e=0x8140e6e0)
#9  sd_event_run (e=0x8140e6e0, timeout=18446744073709551615)
#10 manager_loop (m=0x8140e318)
#11 main (argc=4, argv=0xbfb1b254)

OK, let's see what we have got here:
(gdb) p x->prepare_index
$2 = 0
(gdb) p y->prepare_index
$3 = 2

So we are comparing the new sd_event_source with the first one in the
queue. This is bad. Let's move to shuffle_up first.

(gdb) frame 1
(gdb) p k
$4 = 0

Right. So shuffle_up doesn't iterate more than once here. It has only
swapped item 0 and 2. The prepare queue looks like this now:

(null) e=1 pri=0 pi=0 fd=19
bus-input e=0 pri=0 pi=9 fd=18
bus-input e=1 pri=0 pi=11 fd=12

It doesn't look like a correct order judging by how prepare_event
(sd-event.c) works. The last bus-input above is pretty much broken
because s->prepare callback won't be called anymore.

Am I wrong to believe that it should have swapped 2 and 1, not 2 and 0?

Lennart, can you fix Prioq or correct me if I'm wrong, please?

> This has been debugged on v225, but AFAICS master has the same issues.

I'm debugging this now on v226 now.

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Disabled event sources / errors on sd-bus break the event loop (Looping too fast)

2015-09-22 Thread Krzysztof Kotlenga
Hi.

There were many reports of "Looping too fast. Throttling execution a
little" behavior but they never got anywhere. Not this time. At least
for this particular case.

Initially I was able to reproduce this issue just by calling `systemctl
restart some_specific_service` - then most of further `systemctl`
actions (except status) caused systemd to enter an infinite event loop.

Long story short, I found out that entering some garbage
on /run/systemd/private with netcat breaks systemd equally well.
`systemctl daemon-reexec` fixes it. But before you decide to break
yours PID 1 event loop, I recommend the following gdb script, because
it provides some possibly interesting output:

#v+
b event_prepare
commands 1
set $size = e->prepare->n_items
set $enabled_after_disabled = 0
set $i = 0
while $i < $size
set $s = (sd_event_source*)e->prepare->items[$i].data
if $s->type == SOURCE_IO
printf "%s e=%d pri=%d pi=%u fd=%d\n", \
$s->description, $s->enabled, $s->priority, \
$s->pending_iteration, $s->io.fd
else
printf "%s e=%d\n", $s->description, $s->enabled
end

if $i++ > 0
if $s->enabled == SD_EVENT_ON && $previous->enabled == SD_EVENT_OFF
set $enabled_after_disabled = 1
end
end

set $previous = $s
end

if $enabled_after_disabled
print "XXX shouldn't happen?"
else
continue
end
end
#v-

Now let's follow this. Output shortened for brevity.

1. Attach to systemd and let it run. You will most likely see some
event source, not of particular interest, like this:

bus-input e=1 pri=0 pi=0 fd=12

2.  All is good. Run `nc -U /run/systemd/private`

bus-input e=1 pri=0 pi=0 fd=18
bus-input e=1 pri=0 pi=0 fd=12

3. We are here too. Enter a newline.

bus-input e=1 pri=0 pi=0 fd=12
bus-input e=0 pri=0 pi=15 fd=18

4. This bus-input event source got disabled by source_dispatch, because
bus_socket_auth_verify_server returned -EIO. Now wait, hit Ctrl+C,
generate some events with systemctl - whatever.

bus-input e=0 pri=0 pi=15 fd=18
bus-input e=1 pri=0 pi=0 fd=12
$1 = "XXX shouldn't happen?"

5. The event loop is broken now. It haven't entered an infinite loop
*yet* but calls like `systemctl restart whatever` will cause it. The
reason is that prepare callbacks in event_prepare (sd-event.c) aren't
called anymore - the for loop there breaks on the first disabled source.

The prepare callback of the bus that systemctl talks on (most
importantly sd_bus_get_events) is called only once and sets
POLLIN|POLLOUT flags. That POLLOUT is left there as long as the client
is on the bus and causes an infinite event loop.

When we exit that hanging systemctl, systemd quits looping but the
event loop is still broken. That disabled bus-input from point 4 is
left there forever and prevents prepare callbacks of the enabled event
sources to run.

So, how many bugs are here?

It seems that there are other errors than can happen in sd-bus /
bus-socket, which bus_process_internal (sd-bus.c) doesn't handle and
leaves the disabled event source sitting there forever. Those disabled
event sources are:
- a memory leak?
- very good at breaking event_prepare (sd-event.c).

What's going on with order in that "prepare" Prioq?

This has been debugged on v225, but AFAICS master has the same issues.

-- 
kjk
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel