On 5-12-2011 19:45, Sebastian Benoit wrote:
> I see relayd crashes like this: (1)
> fatal: relay_dispatch_pfe: invalid host id

> or like this: (2)
> fatal: pfe_dispatch_hce: invalid host id

> There is a race of the hce and the other childs (pfe and relays)
> between loading the configuration and start of processing IMSG_HOST_STATUS
> messages.
> 
> The problem is that in hce_setup_events() the host checks are started before
> all childs have all of the configuration.

Yes, I experienced the same thing, see:
http://marc.info/?l=openbsd-bugs&m=132207738531052&w=2

> A quick hack is to insert a sleep(1) at the beginning of hce_setup_events().

No, that does not work, I've seen crashes with sleeps upto 3 seconds on
my system.  And it is still a race.

> A fix might be to make 'invalid host id' non fatal:

That might lead to crashes later on, especially if the hce notifies
about new host ids that the other processes have not loaded yet.

> Another might be to inhibit the processing of IMSG_HOST_STATUS only until
> the configuration has been completed (that is after receiving IMSG_CFG_DONE):

I'm going to try this one.  I'm not sure how bad it is to discard
messages though.

Reply via email to