Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum
On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,
 
 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:
 
   XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.
 
 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?
 
 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user space.
 
 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

At first glance, this does not seem to address the root issue. The
bottom line is that we should not have any thread release an owned lock
it does not hold, kthread or not.

In that respect, xnsynch_release() looks fishy because it may be called
over a context which is _not_ the lock owner, but the thread who is
deleting the lock owner, so assuming lastowner == current_thread when
releasing is wrong.

At the very least, the following patch would prevent
xnsynch_release_all_ownerships() to break badly. The same way, the
fastlock stuff does not track the owner properly in the synchro object.
We should fix those issues before going further, they may be related to
the bug described.

Totally, genuinely, 100% untested.

diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
index 3a53527..0785533 100644
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, xnticks_t 
timeout,
 XN_NO_HANDLE, threadh);
 
if (likely(fastlock == XN_NO_HANDLE)) {
+   xnsynch_set_owner(synch, thread);
xnthread_inc_rescnt(thread);
xnthread_clear_info(thread,
XNRMID | XNTIMEO | XNBREAK);
@@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
 
XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
 
-   lastowner = xnpod_current_thread();
+   lastowner = synch-owner ?: xnpod_current_thread();
xnthread_dec_rescnt(lastowner);
XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
lastownerh = xnthread_handle(lastowner);

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Gilles Chanteperdrix
On 05/31/2011 06:29 PM, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.
 
 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.
 
 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.
 
 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

It looks to me like xnsynch_fast_release uses cmpxchg, so, will not set
the owner to NULL if the current owner is not the thread releasing the
mutex. Is it not sufficient?

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Jan Kiszka
On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.
 
 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.
 
 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.
 
 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.
 
 Totally, genuinely, 100% untested.
 
 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
XN_NO_HANDLE, threadh);
  
   if (likely(fastlock == XN_NO_HANDLE)) {
 + xnsynch_set_owner(synch, thread);
   xnthread_inc_rescnt(thread);
   xnthread_clear_info(thread,
   XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
  
   XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 - lastowner = xnpod_current_thread();
 + lastowner = synch-owner ?: xnpod_current_thread();
   xnthread_dec_rescnt(lastowner);
   XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
   lastownerh = xnthread_handle(lastowner);
 

That's maybe another problem, need to check.

Back to the original issue: with fastlock, kernel space has absolutely
no clue about how many locks user space may hold - unless someone is
contending for all those locks. IOW, you can't reliably track resource
ownership at kernel level without user space help out. The current way
it helps (enforced syscalls of XNOTHER threads) is insufficient.

Alternatively to plain counting of ownership in user space, we could
adopt mainline's robust mutex mechanism (a user space maintained list)
that solves the release-all-ownerships issue. But I haven't looked into
details yet.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum
On Tue, 2011-05-31 at 18:38 +0200, Gilles Chanteperdrix wrote:
 On 05/31/2011 06:29 PM, Philippe Gerum wrote:
  On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
  Hi Philippe,
 
  enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
  in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
  xnsynch_release triggers for RT threads:
 
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
  RT threads do not balance their lock and unlock syscalls, so their
  counter goes wild quite quickly.
 
  But just limiting the bug check to XNOTHER threads is neither a
  solution. How to deal with the counter on scheduling policy changes?
 
  So my suggestion is to convert the auto-relax feature into a service,
  user space can request based on a counter that user space maintains
  independently. I.e. we should create another shared word that user space
  increments and decrements on lock acquisitions/releases on its own. The
  nucleus just tests it when deciding about the relax on return to user 
  space.
 
  But before hacking into that direction, I'd like to hear if it makes
  sense to you.
  
  At first glance, this does not seem to address the root issue. The
  bottom line is that we should not have any thread release an owned lock
  it does not hold, kthread or not.
  
  In that respect, xnsynch_release() looks fishy because it may be called
  over a context which is _not_ the lock owner, but the thread who is
  deleting the lock owner, so assuming lastowner == current_thread when
  releasing is wrong.
  
  At the very least, the following patch would prevent
  xnsynch_release_all_ownerships() to break badly. The same way, the
  fastlock stuff does not track the owner properly in the synchro object.
  We should fix those issues before going further, they may be related to
  the bug described.
 
 It looks to me like xnsynch_fast_release uses cmpxchg, so, will not set
 the owner to NULL if the current owner is not the thread releasing the
 mutex. Is it not sufficient?
 

Yes, we need to move that swap to the irq off section to clear the owner
there as well.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Gilles Chanteperdrix
On 05/31/2011 06:38 PM, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.

 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.

 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

 Totally, genuinely, 100% untested.

 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
   XN_NO_HANDLE, threadh);
  
  if (likely(fastlock == XN_NO_HANDLE)) {
 +xnsynch_set_owner(synch, thread);
  xnthread_inc_rescnt(thread);
  xnthread_clear_info(thread,
  XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
  
  XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -lastowner = xnpod_current_thread();
 +lastowner = synch-owner ?: xnpod_current_thread();
  xnthread_dec_rescnt(lastowner);
  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
  lastownerh = xnthread_handle(lastowner);

 
 That's maybe another problem, need to check.
 
 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.

How so? If an XNOTHER thread goes through a syscall for each locks it
gets, we should be able to do the accounting in kernel-space.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Jan Kiszka
On 2011-05-31 18:50, Gilles Chanteperdrix wrote:
 On 05/31/2011 06:38 PM, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user 
 space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.

 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.

 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

 Totally, genuinely, 100% untested.

 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
  XN_NO_HANDLE, threadh);
  
 if (likely(fastlock == XN_NO_HANDLE)) {
 +   xnsynch_set_owner(synch, thread);
 xnthread_inc_rescnt(thread);
 xnthread_clear_info(thread,
 XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
  
 XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -   lastowner = xnpod_current_thread();
 +   lastowner = synch-owner ?: xnpod_current_thread();
 xnthread_dec_rescnt(lastowner);
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 lastownerh = xnthread_handle(lastowner);


 That's maybe another problem, need to check.

 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.
 
 How so? If an XNOTHER thread goes through a syscall for each locks it
 gets, we should be able to do the accounting in kernel-space.

An XNOTHER threads do (by convention - another reason not to throw a BUG
in the kernel), other don't. So RT threads have an invalid counter when
they are switched to non-RT (on policy change).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Philippe Gerum
On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
  On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
  Hi Philippe,
 
  enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
  in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
  xnsynch_release triggers for RT threads:
 
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 
  RT threads do not balance their lock and unlock syscalls, so their
  counter goes wild quite quickly.
 
  But just limiting the bug check to XNOTHER threads is neither a
  solution. How to deal with the counter on scheduling policy changes?
 
  So my suggestion is to convert the auto-relax feature into a service,
  user space can request based on a counter that user space maintains
  independently. I.e. we should create another shared word that user space
  increments and decrements on lock acquisitions/releases on its own. The
  nucleus just tests it when deciding about the relax on return to user 
  space.
 
  But before hacking into that direction, I'd like to hear if it makes
  sense to you.
  
  At first glance, this does not seem to address the root issue. The
  bottom line is that we should not have any thread release an owned lock
  it does not hold, kthread or not.
  
  In that respect, xnsynch_release() looks fishy because it may be called
  over a context which is _not_ the lock owner, but the thread who is
  deleting the lock owner, so assuming lastowner == current_thread when
  releasing is wrong.
  
  At the very least, the following patch would prevent
  xnsynch_release_all_ownerships() to break badly. The same way, the
  fastlock stuff does not track the owner properly in the synchro object.
  We should fix those issues before going further, they may be related to
  the bug described.
  
  Totally, genuinely, 100% untested.
  
  diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
  index 3a53527..0785533 100644
  --- a/ksrc/nucleus/synch.c
  +++ b/ksrc/nucleus/synch.c
  @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
  xnticks_t timeout,
   XN_NO_HANDLE, threadh);
   
  if (likely(fastlock == XN_NO_HANDLE)) {
  +   xnsynch_set_owner(synch, thread);
  xnthread_inc_rescnt(thread);
  xnthread_clear_info(thread,
  XNRMID | XNTIMEO | XNBREAK);
  @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
   
  XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
   
  -   lastowner = xnpod_current_thread();
  +   lastowner = synch-owner ?: xnpod_current_thread();
  xnthread_dec_rescnt(lastowner);
  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
  lastownerh = xnthread_handle(lastowner);
  
 
 That's maybe another problem, need to check.
 
 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.

The thing is: we don't care about knowing how many locks some
non-current thread owns. What the nucleus wants to know is whether the
_current user-space_ thread owns a lock, which is enough for the
autorelax management. This restricted scope makes the logic fine.

The existing resource counter is by no mean a resource tracking tool
that could be used from whatever context to query the number of locks an
arbitrary thread holds, it has not been intended that way at all. It
only answers the simple question: do I hold any lock, as an XNOTHER
thread.

 
 Alternatively to plain counting of ownership in user space, we could
 adopt mainline's robust mutex mechanism (a user space maintained list)
 that solves the release-all-ownerships issue. But I haven't looked into
 details yet.
 

Would be nice, but still overkill for the purpose of autorelax
management.

 Jan
 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Jan Kiszka
On 2011-05-31 18:58, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user 
 space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.

 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.

 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

 Totally, genuinely, 100% untested.

 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
  XN_NO_HANDLE, threadh);
  
 if (likely(fastlock == XN_NO_HANDLE)) {
 +   xnsynch_set_owner(synch, thread);
 xnthread_inc_rescnt(thread);
 xnthread_clear_info(thread,
 XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
  
 XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -   lastowner = xnpod_current_thread();
 +   lastowner = synch-owner ?: xnpod_current_thread();
 xnthread_dec_rescnt(lastowner);
 XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 lastownerh = xnthread_handle(lastowner);


 That's maybe another problem, need to check.

 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.
 
 The thing is: we don't care about knowing how many locks some
 non-current thread owns. What the nucleus wants to know is whether the
 _current user-space_ thread owns a lock, which is enough for the
 autorelax management. This restricted scope makes the logic fine.

Nope, this does not work for threads that undergo policy changes (see
reply to Gilles).

 
 The existing resource counter is by no mean a resource tracking tool
 that could be used from whatever context to query the number of locks an
 arbitrary thread holds, it has not been intended that way at all. It
 only answers the simple question: do I hold any lock, as an XNOTHER
 thread.

Unfortunately, that was not encoded into the overeager bug check. Plus,
as state before, the counter can't be kept consistent when the thread
was an RT thread before.

 

 Alternatively to plain counting of ownership in user space, we could
 adopt mainline's robust mutex mechanism (a user space maintained list)
 that solves the release-all-ownerships issue. But I haven't looked into
 details yet.

 
 Would be nice, but still overkill for the purpose of autorelax
 management.

Right. But the existing code still needs a redesign, maybe not that big,
but involved ABI changes. So...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Gilles Chanteperdrix
On 05/31/2011 07:06 PM, Jan Kiszka wrote:
 On 2011-05-31 18:58, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

   XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user 
 space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.

 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.

 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

 Totally, genuinely, 100% untested.

 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
 XN_NO_HANDLE, threadh);
  
if (likely(fastlock == XN_NO_HANDLE)) {
 +  xnsynch_set_owner(synch, thread);
xnthread_inc_rescnt(thread);
xnthread_clear_info(thread,
XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch *synch)
  
XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -  lastowner = xnpod_current_thread();
 +  lastowner = synch-owner ?: xnpod_current_thread();
xnthread_dec_rescnt(lastowner);
XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
lastownerh = xnthread_handle(lastowner);


 That's maybe another problem, need to check.

 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.

 The thing is: we don't care about knowing how many locks some
 non-current thread owns. What the nucleus wants to know is whether the
 _current user-space_ thread owns a lock, which is enough for the
 autorelax management. This restricted scope makes the logic fine.
 
 Nope, this does not work for threads that undergo policy changes (see
 reply to Gilles).

Is it a really useful use-cache?

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fragile lock usage tracking for auto-relax

2011-05-31 Thread Jan Kiszka
On 2011-05-31 19:58, Gilles Chanteperdrix wrote:
 On 05/31/2011 07:06 PM, Jan Kiszka wrote:
 On 2011-05-31 18:58, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 18:38 +0200, Jan Kiszka wrote:
 On 2011-05-31 18:29, Philippe Gerum wrote:
 On Tue, 2011-05-31 at 13:37 +0200, Jan Kiszka wrote:
 Hi Philippe,

 enabling XENO_OPT_DEBUG_NUCLEUS reveals some shortcomings of the
 in-kernel lock usage tracking via xnthread_t::hrescnt. This BUGON in
 xnsynch_release triggers for RT threads:

  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);

 RT threads do not balance their lock and unlock syscalls, so their
 counter goes wild quite quickly.

 But just limiting the bug check to XNOTHER threads is neither a
 solution. How to deal with the counter on scheduling policy changes?

 So my suggestion is to convert the auto-relax feature into a service,
 user space can request based on a counter that user space maintains
 independently. I.e. we should create another shared word that user space
 increments and decrements on lock acquisitions/releases on its own. The
 nucleus just tests it when deciding about the relax on return to user 
 space.

 But before hacking into that direction, I'd like to hear if it makes
 sense to you.

 At first glance, this does not seem to address the root issue. The
 bottom line is that we should not have any thread release an owned lock
 it does not hold, kthread or not.

 In that respect, xnsynch_release() looks fishy because it may be called
 over a context which is _not_ the lock owner, but the thread who is
 deleting the lock owner, so assuming lastowner == current_thread when
 releasing is wrong.

 At the very least, the following patch would prevent
 xnsynch_release_all_ownerships() to break badly. The same way, the
 fastlock stuff does not track the owner properly in the synchro object.
 We should fix those issues before going further, they may be related to
 the bug described.

 Totally, genuinely, 100% untested.

 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index 3a53527..0785533 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -424,6 +424,7 @@ xnflags_t xnsynch_acquire(struct xnsynch *synch, 
 xnticks_t timeout,
XN_NO_HANDLE, threadh);
  
   if (likely(fastlock == XN_NO_HANDLE)) {
 + xnsynch_set_owner(synch, thread);
   xnthread_inc_rescnt(thread);
   xnthread_clear_info(thread,
   XNRMID | XNTIMEO | XNBREAK);
 @@ -718,7 +719,7 @@ struct xnthread *xnsynch_release(struct xnsynch 
 *synch)
  
   XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 - lastowner = xnpod_current_thread();
 + lastowner = synch-owner ?: xnpod_current_thread();
   xnthread_dec_rescnt(lastowner);
   XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
   lastownerh = xnthread_handle(lastowner);


 That's maybe another problem, need to check.

 Back to the original issue: with fastlock, kernel space has absolutely
 no clue about how many locks user space may hold - unless someone is
 contending for all those locks. IOW, you can't reliably track resource
 ownership at kernel level without user space help out. The current way
 it helps (enforced syscalls of XNOTHER threads) is insufficient.

 The thing is: we don't care about knowing how many locks some
 non-current thread owns. What the nucleus wants to know is whether the
 _current user-space_ thread owns a lock, which is enough for the
 autorelax management. This restricted scope makes the logic fine.

 Nope, this does not work for threads that undergo policy changes (see
 reply to Gilles).
 
 Is it a really useful use-cache?

The question is rather if it a valid one.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core