Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts

2016-10-12 Thread Zefir Kurtisi
On 10/07/2016 03:19 PM, Zefir Kurtisi wrote:
> On 10/07/2016 02:15 PM, Alexandru Ardelean wrote:
>> On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkau  wrote:
>>> Instead of introducing yet another timer, wouldn't it also be possible
>>> to close this race window by registering the event handler before
>>> attempting the lookup?
>>>
>>> - Felix
>>
>> I've also seen this race.
>> I tried something like this:
>> https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2
>>
>> But never got around to doing much testing to see if the race goes
>> away completely.
>> So, I never pushed it upstream.
>>
>> @Zefir, maybe you could try it ?
>>
>> Thanks
>> Alex
>>
> Hi Alex,
> 
> your assumption is right, that's the root cause for the random timeouts.
> 
> Unfortunately, it is hard to provide a positive proof, since for me the effect
> went away when I added some logging in between.
> 
> My patch made it disappear, but of course what Felix suggests and you already
> implemented is the better approach. I'll take your commit instead and test it.
> From looking at the changes it should do, but to get some confidence it will 
> take
> some time.
> 
> 
After some days of testing, it seems like the patch fixes the race.

Alex, feel free to add me to the 'Tested-by' list if you are going to make a 
pull
request for that commit.


Cheers,
Zefir


___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts

2016-10-07 Thread Zefir Kurtisi
On 10/07/2016 02:15 PM, Alexandru Ardelean wrote:
> On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkau  wrote:
>> Instead of introducing yet another timer, wouldn't it also be possible
>> to close this race window by registering the event handler before
>> attempting the lookup?
>>
>> - Felix
>>
>> ___
>> Lede-dev mailing list
>> Lede-dev@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/lede-dev
> 
> I've also seen this race.
> I tried something like this:
> https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2
> 
> But never got around to doing much testing to see if the race goes
> away completely.
> So, I never pushed it upstream.
> 
> @Zefir, maybe you could try it ?
> 
> Thanks
> Alex
> 
Hi Alex,

your assumption is right, that's the root cause for the random timeouts.

Unfortunately, it is hard to provide a positive proof, since for me the effect
went away when I added some logging in between.

My patch made it disappear, but of course what Felix suggests and you already
implemented is the better approach. I'll take your commit instead and test it.
>From looking at the changes it should do, but to get some confidence it will 
>take
some time.


Cheers,
Zefir

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts

2016-10-07 Thread Alexandru Ardelean
On Fri, Oct 7, 2016 at 3:09 PM, Felix Fietkau  wrote:
> On 2016-10-07 13:57, Zefir Kurtisi wrote:
>> In ubus_cli_wait_for() there is a critical section between
>> initially checking for the requested services and the
>> following handling of 'ubus.object.add' events.
>>
>> In our system we let procd (re)start services and synchronize
>> inter-service dependencies by using 'ubus wait_for' in the
>> initscripts' service_started() functions. There we observe
>> that 'wait_for' randomly is waiting for the full timeout
>> and returning UBUS_STATUS_TIMEOUT, even if the service it
>> is waiting for is already up and running.
>>
>> This happens when the service is started in the critical
>> section mentioned above. This commit adds periodic lookup
>> for the requested services while waiting for the 'add' event
>> and with that fixes the observed failure.
>>
>> Signed-off-by: Zefir Kurtisi 
> Instead of introducing yet another timer, wouldn't it also be possible
> to close this race window by registering the event handler before
> attempting the lookup?
>
> - Felix
>
> ___
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

I've also seen this race.
I tried something like this:
https://github.com/commodo/ubus/commit/8c3986caaa7cd2c12f2b8907ceea54c5bdce3bd2

But never got around to doing much testing to see if the race goes
away completely.
So, I never pushed it upstream.

@Zefir, maybe you could try it ?

Thanks
Alex

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [PATCH] ubus cli: wait_for: fix race causing false timeouts

2016-10-07 Thread Felix Fietkau
On 2016-10-07 13:57, Zefir Kurtisi wrote:
> In ubus_cli_wait_for() there is a critical section between
> initially checking for the requested services and the
> following handling of 'ubus.object.add' events.
> 
> In our system we let procd (re)start services and synchronize
> inter-service dependencies by using 'ubus wait_for' in the
> initscripts' service_started() functions. There we observe
> that 'wait_for' randomly is waiting for the full timeout
> and returning UBUS_STATUS_TIMEOUT, even if the service it
> is waiting for is already up and running.
> 
> This happens when the service is started in the critical
> section mentioned above. This commit adds periodic lookup
> for the requested services while waiting for the 'add' event
> and with that fixes the observed failure.
> 
> Signed-off-by: Zefir Kurtisi 
Instead of introducing yet another timer, wouldn't it also be possible
to close this race window by registering the event handler before
attempting the lookup?

- Felix

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev