Re: [LEDE-DEV] [PATCH] libubus: initialize list member for ubus_auto_conn's timer

2016-12-13 Thread Alexandru Ardelean
On Tue, Dec 13, 2016 at 9:52 AM, Felix Fietkau  wrote:
> On 2016-12-13 08:41, Alexandru Ardelean wrote:
>> Every once in a while, we'll get stacktrace:
>> ```
>> (gdb) bt
>> \#0  0xb7bc4668 in _list_del (entry=0x10015688 ) at 
>> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
>> \#1  list_del (entry=0x10015688 ) at 
>> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
>> \#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 ) at 
>> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
>> \#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 ) at 
>> /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
>> \#4  module_ubus_fini () at ubus.c:64
>> \#5  0x100013fc in main (argc=, argv=) at 
>> module.c:128
>> ```
>>
>> In our code, `ubus_auto_connect()` is called, then due to some logic,
>> the module has to quickly stop, calling ubus_auto_shutdown().
>>
>> It seems that there is case in where the `timer` timeout, is not yet
>> registered with the internal list of timeouts from uloop, which
>> ends up trying to delete an invalid list.
>>
>> So, one solution is to init the list element of the `timer` of
>> the `ubus_auto_conn` struct.
>>
>> Signed-off-by: Alexandru Ardelean 
> I think this patch is a workaround that simply hides the real issue.
> uloop_timeout_cancel will only call list_del if the timer 'pending' flag
> is set.
>
> It seems to me that you might be using ubus_auto_connect on
> uninitialized memory. If that's the case, you might run into issues in
> other places as well, and you should just add a memset to your application.
>
> - Felix

Hmmm, you're right.
It is only one module I'm seeing this in.
Maybe it's some weird memory corruption, or race ; since that
ubus_auto_conn struct is init-ed to 0 [since it's static].

Will dig deeper into that.

Sorry for the noise.

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [PATCH] libubus: initialize list member for ubus_auto_conn's timer

2016-12-12 Thread Felix Fietkau
On 2016-12-13 08:41, Alexandru Ardelean wrote:
> Every once in a while, we'll get stacktrace:
> ```
> (gdb) bt
> \#0  0xb7bc4668 in _list_del (entry=0x10015688 ) at 
> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
> \#1  list_del (entry=0x10015688 ) at 
> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
> \#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 ) at 
> /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
> \#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 ) at 
> /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
> \#4  module_ubus_fini () at ubus.c:64
> \#5  0x100013fc in main (argc=, argv=) at 
> module.c:128
> ```
> 
> In our code, `ubus_auto_connect()` is called, then due to some logic,
> the module has to quickly stop, calling ubus_auto_shutdown().
> 
> It seems that there is case in where the `timer` timeout, is not yet
> registered with the internal list of timeouts from uloop, which
> ends up trying to delete an invalid list.
> 
> So, one solution is to init the list element of the `timer` of
> the `ubus_auto_conn` struct.
> 
> Signed-off-by: Alexandru Ardelean 
I think this patch is a workaround that simply hides the real issue.
uloop_timeout_cancel will only call list_del if the timer 'pending' flag
is set.

It seems to me that you might be using ubus_auto_connect on
uninitialized memory. If that's the case, you might run into issues in
other places as well, and you should just add a memset to your application.

- Felix

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


[LEDE-DEV] [PATCH] libubus: initialize list member for ubus_auto_conn's timer

2016-12-12 Thread Alexandru Ardelean
Every once in a while, we'll get stacktrace:
```
(gdb) bt
\#0  0xb7bc4668 in _list_del (entry=0x10015688 ) at 
/home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
\#1  list_del (entry=0x10015688 ) at 
/home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
\#2  uloop_timeout_cancel (timeout=timeout@entry=0x10015688 ) at 
/home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
\#3  0x10003794 in ubus_auto_shutdown (conn=0x10015614 ) at 
/home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
\#4  module_ubus_fini () at ubus.c:64
\#5  0x100013fc in main (argc=, argv=) at 
module.c:128
```

In our code, `ubus_auto_connect()` is called, then due to some logic,
the module has to quickly stop, calling ubus_auto_shutdown().

It seems that there is case in where the `timer` timeout, is not yet
registered with the internal list of timeouts from uloop, which
ends up trying to delete an invalid list.

So, one solution is to init the list element of the `timer` of
the `ubus_auto_conn` struct.

Signed-off-by: Alexandru Ardelean 
---
 libubus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libubus.c b/libubus.c
index 8163ff7..faa30d3 100644
--- a/libubus.c
+++ b/libubus.c
@@ -337,6 +337,7 @@ static void ubus_auto_connect_cb(struct uloop_timeout 
*timeout)
 void ubus_auto_connect(struct ubus_auto_conn *conn)
 {
conn->timer.cb = ubus_auto_connect_cb;
+   INIT_LIST_HEAD(>timer.list);
ubus_auto_connect_cb(>timer);
 }
 
-- 
2.6.6


___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev