Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts
On 10/07/2016 03:19 PM, Zefir Kurtisi wrote: > On 10/07/2016 02:15 PM, Alexandru Ardelean wrote: >> On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkauwrote: >>> Instead of introducing yet another timer, wouldn't it also be possible >>> to close this race window by registering the event handler before >>> attempting the lookup? >>> >>> - Felix >> >> I've also seen this race. >> I tried something like this: >> https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2 >> >> But never got around to doing much testing to see if the race goes >> away completely. >> So, I never pushed it upstream. >> >> @Zefir, maybe you could try it ? >> >> Thanks >> Alex >> > Hi Alex, > > your assumption is right, that's the root cause for the random timeouts. > > Unfortunately, it is hard to provide a positive proof, since for me the effect > went away when I added some logging in between. > > My patch made it disappear, but of course what Felix suggests and you already > implemented is the better approach. I'll take your commit instead and test it. > From looking at the changes it should do, but to get some confidence it will > take > some time. > > After some days of testing, it seems like the patch fixes the race. Alex, feel free to add me to the 'Tested-by' list if you are going to make a pull request for that commit. Cheers, Zefir ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts
On 10/07/2016 02:15 PM, Alexandru Ardelean wrote: > On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkauwrote: >> Instead of introducing yet another timer, wouldn't it also be possible >> to close this race window by registering the event handler before >> attempting the lookup? >> >> - Felix >> >> ___ >> Lede-dev mailing list >> Lede-dev@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/lede-dev > > I've also seen this race. > I tried something like this: > https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2 > > But never got around to doing much testing to see if the race goes > away completely. > So, I never pushed it upstream. > > @Zefir, maybe you could try it ? > > Thanks > Alex > Hi Alex, your assumption is right, that's the root cause for the random timeouts. Unfortunately, it is hard to provide a positive proof, since for me the effect went away when I added some logging in between. My patch made it disappear, but of course what Felix suggests and you already implemented is the better approach. I'll take your commit instead and test it. >From looking at the changes it should do, but to get some confidence it will >take some time. Cheers, Zefir ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts
On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkauwrote: > On 2016-10-07 13:57, Zefir Kurtisi wrote: >> In ubus_cli_wait_for() there is a critical section between >> initially checking for the requested services and the >> following handling of 'ubus.object.add' events. >> >> In our system we let procd (re)start services and synchronize >> inter-service dependencies by using 'ubus wait_for' in the >> initscripts' service_started() functions. There we observe >> that 'wait_for' randomly is waiting for the full timeout >> and returning UBUS_STATUS_TIMEOUT, even if the service it >> is waiting for is already up and running. >> >> This happens when the service is started in the critical >> section mentioned above. This commit adds periodic lookup >> for the requested services while waiting for the 'add' event >> and with that fixes the observed failure. >> >> Signed-off-by: Zefir Kurtisi > Instead of introducing yet another timer, wouldn't it also be possible > to close this race window by registering the event handler before > attempting the lookup? > > - Felix > > ___ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev I've also seen this race. I tried something like this: https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2 But never got around to doing much testing to see if the race goes away completely. So, I never pushed it upstream. @Zefir, maybe you could try it ? Thanks Alex ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts
On 2016-10-07 13:57, Zefir Kurtisi wrote: > In ubus_cli_wait_for() there is a critical section between > initially checking for the requested services and the > following handling of 'ubus.object.add' events. > > In our system we let procd (re)start services and synchronize > inter-service dependencies by using 'ubus wait_for' in the > initscripts' service_started() functions. There we observe > that 'wait_for' randomly is waiting for the full timeout > and returning UBUS_STATUS_TIMEOUT, even if the service it > is waiting for is already up and running. > > This happens when the service is started in the critical > section mentioned above. This commit adds periodic lookup > for the requested services while waiting for the 'add' event > and with that fixes the observed failure. > > Signed-off-by: Zefir KurtisiInstead of introducing yet another timer, wouldn't it also be possible to close this race window by registering the event handler before attempting the lookup? - Felix ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev