[Xenomai-core] [PULL] rt-puts fix, mprotect testsuite

2012-04-04 Thread Jan Kiszka
The following changes since commit 0ef2410a2c9cf7102dead861241bd2d9957e4433:

  Mask signals in rt_print:printer_loop() (2012-04-02 00:16:41 +0200)

are available in the git repository at:
  git://git.xenomai.org/xenomai-jki.git for-upstream

Jan Kiszka (3):
  Append missing newline to rt_[f]puts output
  testsuite: Add rt-print buffer flushes to native error paths
  Add regression test for mprotect on pinned memory

 src/skins/common/rt_print.c|7 ++-
 src/testsuite/regression/native/check.h|2 +
 src/testsuite/regression/native/leaks.c|1 +
 src/testsuite/regression/posix/Makefile.am |2 +-
 src/testsuite/regression/posix/Makefile.in |   15 -
 src/testsuite/regression/posix/check.h |   10 +++
 src/testsuite/regression/posix/mprotect.c  |   97 
 7 files changed, 129 insertions(+), 5 deletions(-)
 create mode 100644 src/testsuite/regression/posix/mprotect.c

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] rt-puts fix, mprotect testsuite

2012-04-04 Thread Jan Kiszka
On 2012-04-04 15:02, Gilles Chanteperdrix wrote:
 On 04/04/2012 02:56 PM, Jan Kiszka wrote:
 The following changes since commit 0ef2410a2c9cf7102dead861241bd2d9957e4433:

   Mask signals in rt_print:printer_loop() (2012-04-02 00:16:41 +0200)

 are available in the git repository at:
   git://git.xenomai.org/xenomai-jki.git for-upstream

 Jan Kiszka (3):
   Append missing newline to rt_[f]puts output
   testsuite: Add rt-print buffer flushes to native error paths
 
 As I said, I do not agree with calling rt_print_flush_buffers outside of
 xenomai libs.

rt_print_flush_buffers is a Xenomai API function that we export for
quite a while now. The rt_* functions are about explicit control when
what is invoked, both in the native skin and in what used to be rtdk.
Also, you can't avoid this function when you interact with libraries
that are unwrapped.

That said, I can fix that minor issue in leaks differently if you insist.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] rt-puts fix, mprotect testsuite

2012-04-04 Thread Jan Kiszka
On 2012-04-04 15:23, Gilles Chanteperdrix wrote:
 On 04/04/2012 03:12 PM, Jan Kiszka wrote:
 On 2012-04-04 15:02, Gilles Chanteperdrix wrote:
 On 04/04/2012 02:56 PM, Jan Kiszka wrote:
 The following changes since commit 
 0ef2410a2c9cf7102dead861241bd2d9957e4433:

   Mask signals in rt_print:printer_loop() (2012-04-02 00:16:41 +0200)

 are available in the git repository at:
   git://git.xenomai.org/xenomai-jki.git for-upstream

 Jan Kiszka (3):
   Append missing newline to rt_[f]puts output
   testsuite: Add rt-print buffer flushes to native error paths

 As I said, I do not agree with calling rt_print_flush_buffers outside of
 xenomai libs.

 rt_print_flush_buffers is a Xenomai API function that we export for
 quite a while now. The rt_* functions are about explicit control when
 what is invoked, both in the native skin and in what used to be rtdk.
 Also, you can't avoid this function when you interact with libraries
 that are unwrapped.

 That said, I can fix that minor issue in leaks differently if you insist.
 
 I added rt_print_flush_buffers for xenomai 2.6.0, in order to implement
 systematic wrapping of printf by the posix skin, my intent was not to
 really export it. From my point of view, having to call this flush
 function all over the place reveals a problem in the application. If you
 use always printf or always rt_printf, you do not need to call this
 function.

Still, there are those use cases where you cannot replace the print
functions. So it remains useful.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : Add regression test for mprotect on pinned memory

2012-04-03 Thread Jan Kiszka
On 2012-04-02 16:35, Gilles Chanteperdrix wrote:
 On 04/02/2012 04:09 PM, GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 410e90d085d21dc913f8724efafe6ae75bd3c952
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=410e90d085d21dc913f8724efafe6ae75bd3c952

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Fri Mar 30 18:06:27 2012 +0200

 Add regression test for mprotect on pinned memory

 This tests both the original issue of mprotect reintroducing COW pages
 to Xenomai processes as well as the recently fixed zero page corruption.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 +static void check_inner(const char *fn, int line, const char *msg,
 +int status, int expected)
 +{
 +if (status == expected)
 +return;
 +
 +rt_task_set_mode(T_WARNSW, 0, NULL);
 +rt_print_flush_buffers();
 (...)
 +static void check_value_inner(const char *fn, int line, const char *msg,
 +  int value, int expected)
 +{
 +if (value == expected)
 +return;
 +
 +rt_task_set_mode(T_WARNSW, 0, NULL);
 +rt_print_flush_buffers();
 (...)
 +void sigdebug_handler(int sig, siginfo_t *si, void *context)
 +{
 +unsigned int reason = si-si_value.sival_int;
 +
 +rt_print_flush_buffers();
 (...)
 +
 +rt_task_set_mode(T_WARNSW, 0, NULL);
 +rt_print_flush_buffers();
 
 Maybe you could use posix skin's printf instead of putting calls to
 rt_print_flush_buffers all over the place? I did not mean for this call
 to be exported, I only added it for internal use by the posix skin.
 

Could be done, likely together with a complete switch to posix.

I could also start to use the check_* wrappers that I just discovered.
BTW, the native version lacks that flush unless it's used in
native+posix context. I will write a fix.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Using xenomai with standard ethernet driver

2012-03-23 Thread Jan Kiszka
On 2012-03-22 17:37, Roberto Bielli wrote:
 I explain better the problem.
 
 I want to use ethernet driver raw without using all the abstract layer 
 (tcp, socket etc)
 and my xenomai application must link to the raw driver without enter in 
 secondary mode.
 
 Have i change the driver interface with xenomai calls (rtdm_..) or 
 is there another chance ?

Did you check if RTnet already has support for your NIC? Then you are
fine with a stack of rtnet.ko, rtpacket.ko, and your-nic.ko.

If not, there are two options
 - add an RTnet driver for you hardware
 - implement some home-brewed stack

The second option may only look different on first sight. But,
specifically if your application is not kernel-hosted but a proper
userspace app, RTnet provides quite a few useful building blocks for
your scenario.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Using xenomai with standard ethernet driver

2012-03-23 Thread Jan Kiszka
On 2012-03-23 12:18, Roberto Bielli wrote:
 Hi Jan,
 
 i want to send and receive custom raw packets over ethernet.
 is it possibile with RTnet or it sends only rtnet packets ?
 
 Thanks for your response.

RTnet is modular, and if you use the stack a sketched, you are free to
define the payload of the sent Ethernet frames on your own.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Using xenomai with standard ethernet driver

2012-03-22 Thread Jan Kiszka
On 2012-03-21 17:48, Roberto Bielli wrote:
 Hi,
 
 i want to use xenomai for sending standard tcp/ip packets.
 Is it possible to rewrite the ethernet driver for xenomai without using
 rtnet ?
 Or if i use rtnet is there a way to send standard tcp packet instead
 rtnet packets ?

Is the TCP/IP communication itself part of your time-critical path, or
could it run with lower priority as well?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH forge] Fix build for relative invocations of configure

2012-02-07 Thread Jan Kiszka
This fixes build setups like '../configure'.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 configure.in |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure.in b/configure.in
index c0a7d17..0bdced8 100644
--- a/configure.in
+++ b/configure.in
@@ -547,7 +547,7 @@ LD_FILE_OPTION=$ac_cv_ld_file_option
 AC_SUBST(LD_FILE_OPTION)
 
 if test x$rtcore_type = xcobalt; then
-   XENO_USER_CFLAGS=-I$srcdir/include/cobalt $XENO_USER_CFLAGS
+   XENO_USER_CFLAGS=-I`cd $srcdir  pwd`/include/cobalt $XENO_USER_CFLAGS
if [[ $ac_cv_ld_file_option = yes ]]; then
XENO_POSIX_WRAPPERS=-Wl,@`cd $srcdir  pwd`/lib/cobalt/posix.wrappers
else
-- 
1.7.3.4

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Only export required CFLAGS via xeno-config

2012-02-07 Thread Jan Kiszka
On 2012-02-07 17:18, Gilles Chanteperdrix wrote:
 On 02/03/2012 03:50 PM, Jan Kiszka wrote:
 -Werror-implicit-function-declaration is not compatible with C++, and
 also decisions about -Wall and -pipe should be left to the application.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 Had to revert this patch, it causes a build failure when cross-compiling
 (for ARM, I do not know if it matters).

-pipe? Or a weird compiler bug? -Wsomething can't make a difference.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Built-in libxenomai dependency

2012-02-07 Thread Jan Kiszka
On 2012-02-07 17:19, Gilles Chanteperdrix wrote:
 On 02/01/2012 09:21 PM, Gilles Chanteperdrix wrote:
 On 02/01/2012 04:50 PM, Jan Kiszka wrote:
 On 2012-02-01 16:38, Gilles Chanteperdrix wrote:
 On 02/01/2012 04:25 PM, Jan Kiszka wrote:
 On 2012-02-01 16:17, Gilles Chanteperdrix wrote:
 On 02/01/2012 03:37 PM, Jan Kiszka wrote:
 Hi,

 don't remember anymore: Is there any subtle reason that prevent a
 change like

 diff --git a/src/skins/native/Makefile.am b/src/skins/native/Makefile.am
 index 39eaaed..4cc8859 100644
 --- a/src/skins/native/Makefile.am
 +++ b/src/skins/native/Makefile.am
 @@ -22,6 +22,9 @@ libnative_la_SOURCES = \
 wrappers.c \
 wrappers.h
  
 +libnative_la_LIBADD = \
 +   ../common/libxenomai.la
 +
  libnative_la_CPPFLAGS = \
 @XENO_USER_CFLAGS@ \
 -I$(top_srcdir)/include
 diff --git a/src/skins/rtdm/Makefile.am b/src/skins/rtdm/Makefile.am
 index 8ad74be..2dc0a90 100644
 --- a/src/skins/rtdm/Makefile.am
 +++ b/src/skins/rtdm/Makefile.am
 @@ -8,6 +8,9 @@ librtdm_la_SOURCES = \
 core.c \
 init.c
  
 +librtdm_la_LIBADD = \
 +   ../common/libxenomai.la
 +
  librtdm_la_CPPFLAGS = \
 @XENO_USER_CFLAGS@ \
 -I$(top_srcdir)/include
 diff --git a/src/testsuite/latency/Makefile.am 
 b/src/testsuite/latency/Makefile.am
 index c772c26..6534df5 100644
 --- a/src/testsuite/latency/Makefile.am
 +++ b/src/testsuite/latency/Makefile.am
 @@ -14,5 +14,4 @@ latency_LDFLAGS = @XENO_USER_LDFLAGS@
  latency_LDADD = \
 ../../skins/native/libnative.la \
 ../../skins/rtdm/librtdm.la \
 -   ../../skins/common/libxenomai.la \
 -lpthread -lm

 i.e. that we let the skin libraries depend on libxenomai and then remove
 the explicit dependency from our binaries and the xeno-config output?
 Is there some ordering issue again (we have -lskin before -lxenomai
 in the ldflags).

 If possible, this would allow for things like dlopen(libnative.so).

 It allows xeno-config result to work both with dynamic and static
 libraries. Static libraries have no dependency system, so, when linking
 a program whith libnative.a for instance, without libtool, you still
 have to link it with libxenomai.a.

 OK, part two could stay, but the dependencies should still be added to
 the skin libs - if possible.


 How come you can not dlopen libnative.so, dlopening libxenomai.so before
 does not work?

 Dependencies of libnative on libxenomai are not resolved when you open
 the former even if the latter is already loaded. Maybe you can do this
 by pulling in all required symbols one by one manually, haven't tried
 yet. But that would at least be unhandy.

 What about RTLD_GLOBAL ?


 Was possibly the reason, need to check back.

 Still, what prevents stating the existing dependency of libskin on
 libxenomai? The dance above would than be obsolete.

 The change is merged. I took the chance to check that static build still
 built.

 
 Unfortunately this change also causes a build failure when
 cross-compiling for ARM, so, change reverted, too.
 

Indeed unfortunate. Any pointers to logs?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Built-in libxenomai dependency

2012-02-07 Thread Jan Kiszka
On 2012-02-07 17:28, Gilles Chanteperdrix wrote:
 This causes a -L/usr/lib to be added on the link-edit command line,
 which causes the link to fail by finding /usr/lib/libpthread.so instead
 of the cross-compiler one, and fail.

How does libxenomai.la look like? Is it different from a native build?

Hmm, there must be some valid way to express library dependencies inside
you package.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH] Only export required CFLAGS via xeno-config

2012-02-03 Thread Jan Kiszka
-Werror-implicit-function-declaration is not compatible with C++, and
also decisions about -Wall and -pipe should be left to the application.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 configure.in |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/configure.in b/configure.in
index b2563ee..2185925 100644
--- a/configure.in
+++ b/configure.in
@@ -448,13 +448,13 @@ else
 fi
 AC_MSG_RESULT([done])
 
-dnl Common CFLAGS and LDFLAGS
-XENO_USER_CFLAGS=-D_GNU_SOURCE -D_REENTRANT -Wall 
-Werror-implicit-function-declaration -pipe
-XENO_USER_LDFLAGS=
+dnl Exported CFLAGS and LDFLAGS, shared with internal flags
+XENO_USER_APP_CFLAGS=-D_GNU_SOURCE -D_REENTRANT
+XENO_USER_APP_LDFLAGS=
 
-dnl Exported CFLAGS and LDFLAGS, may be enhanced per-arch below
-XENO_USER_APP_CFLAGS=$XENO_USER_CFLAGS
-XENO_USER_APP_LDFLAGS=$XENO_USER_LDFLAGS
+dnl Internal CFLAGS and LDFLAGS, may be enhanced per-arch below
+XENO_USER_CFLAGS=$XENO_USER_CFLAGS -Wall 
-Werror-implicit-function-declaration -pipe
+XENO_USER_LDFLAGS=$XENO_USER_APP_LDFLAGS
 
 case $XENO_TARGET_ARCH in
  x86)
-- 
1.7.3.4

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Built-in libxenomai dependency

2012-02-01 Thread Jan Kiszka
On 2012-02-01 16:17, Gilles Chanteperdrix wrote:
 On 02/01/2012 03:37 PM, Jan Kiszka wrote:
 Hi,

 don't remember anymore: Is there any subtle reason that prevent a
 change like

 diff --git a/src/skins/native/Makefile.am b/src/skins/native/Makefile.am
 index 39eaaed..4cc8859 100644
 --- a/src/skins/native/Makefile.am
 +++ b/src/skins/native/Makefile.am
 @@ -22,6 +22,9 @@ libnative_la_SOURCES = \
  wrappers.c \
  wrappers.h
  
 +libnative_la_LIBADD = \
 +../common/libxenomai.la
 +
  libnative_la_CPPFLAGS = \
  @XENO_USER_CFLAGS@ \
  -I$(top_srcdir)/include
 diff --git a/src/skins/rtdm/Makefile.am b/src/skins/rtdm/Makefile.am
 index 8ad74be..2dc0a90 100644
 --- a/src/skins/rtdm/Makefile.am
 +++ b/src/skins/rtdm/Makefile.am
 @@ -8,6 +8,9 @@ librtdm_la_SOURCES = \
  core.c \
  init.c
  
 +librtdm_la_LIBADD = \
 +../common/libxenomai.la
 +
  librtdm_la_CPPFLAGS = \
  @XENO_USER_CFLAGS@ \
  -I$(top_srcdir)/include
 diff --git a/src/testsuite/latency/Makefile.am 
 b/src/testsuite/latency/Makefile.am
 index c772c26..6534df5 100644
 --- a/src/testsuite/latency/Makefile.am
 +++ b/src/testsuite/latency/Makefile.am
 @@ -14,5 +14,4 @@ latency_LDFLAGS = @XENO_USER_LDFLAGS@
  latency_LDADD = \
  ../../skins/native/libnative.la \
  ../../skins/rtdm/librtdm.la \
 -../../skins/common/libxenomai.la \
  -lpthread -lm

 i.e. that we let the skin libraries depend on libxenomai and then remove
 the explicit dependency from our binaries and the xeno-config output?
 Is there some ordering issue again (we have -lskin before -lxenomai
 in the ldflags).

 If possible, this would allow for things like dlopen(libnative.so).
 
 It allows xeno-config result to work both with dynamic and static
 libraries. Static libraries have no dependency system, so, when linking
 a program whith libnative.a for instance, without libtool, you still
 have to link it with libxenomai.a.

OK, part two could stay, but the dependencies should still be added to
the skin libs - if possible.

 
 How come you can not dlopen libnative.so, dlopening libxenomai.so before
 does not work?

Dependencies of libnative on libxenomai are not resolved when you open
the former even if the latter is already loaded. Maybe you can do this
by pulling in all required symbols one by one manually, haven't tried
yet. But that would at least be unhandy.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-25 19:05, Jan Kiszka wrote:
 On 2012-01-25 18:44, Gilles Chanteperdrix wrote:
 On 01/25/2012 06:10 PM, Jan Kiszka wrote:
 On 2012-01-25 18:02, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:52 PM, Jan Kiszka wrote:
 On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

 What are unit test for (as they are defined here)? Looks a bit 
 inconsistent.

 I put under regression all the tests I have which corresponded to
 things that failed one time or another in xenomai past. Maybe we could
 move unit tests under regression.


 - we already have a regression test for the watchdog called mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the code in
 the mayday page, a nice feature)

 It seems to have failed in that important last discipline. Need to check
 why.

 Because it didn't check the page content for correctness. But that's now
 done via the new watchdog test. I can keep the debug output, but the
 watchdog test of mayday looks obsolete to me. Am I missing something?

 The watchdog does two things: it first sends a SIGDEBUG, then if the
 application is still spinning, it sends a SIGSEGV. As far as I
 understood, you test tests the first case, and mayday tests the second
 case, so, I agree that mayday should be removed, but whatever it tests
 should be integrated in the sigdebug test.


 Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the
 test case actually specified a bug as correct behavior.

 The fallback case is in fact killing the RT task as before. But I'm
 unsure right now: will this leave the system always in a clean state
 behind?

 The test case being a test case and doing nothing particular, I do not
 see what could go wrong. And if something goes wrong, then it needs fixing.
 
 Well, if you kill a RT task while it's running in the kernel, you risk
 inconsistent system states (held mutexex etc.). In this case the task is
 supposed to spin in user space. If that is always safe, let's implement
 the test.

Had a closer look: These days the two-stage killing is only useful to
catch endless loops in the kernel. User space tasks can't get around
being migrated on watchdog events, even when SIGDEBUG is ignored.

To trigger the enforced task termination without leaving any broken
states behind, there is one option: rt_task_spin. Surprisingly for me,
it actually spins in the kernel, thus triggers the second level if
waiting long enough. I wonder, though, if that behavior shouldn't be
improved, ie. the spinning loop be closed in user space - which would
take away that option again.

Thoughts?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-26 12:20, Philippe Gerum wrote:
 On 01/26/2012 11:36 AM, Jan Kiszka wrote:
 On 2012-01-25 19:05, Jan Kiszka wrote:
 On 2012-01-25 18:44, Gilles Chanteperdrix wrote:
 On 01/25/2012 06:10 PM, Jan Kiszka wrote:
 On 2012-01-25 18:02, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:52 PM, Jan Kiszka wrote:
 On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6
 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

 What are unit test for (as they are defined here)? Looks a bit
 inconsistent.

 I put under regression all the tests I have which corresponded to
 things that failed one time or another in xenomai past. Maybe we
 could
 move unit tests under regression.


 - we already have a regression test for the watchdog called
 mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of
 the code in
 the mayday page, a nice feature)

 It seems to have failed in that important last discipline. Need
 to check
 why.

 Because it didn't check the page content for correctness. But
 that's now
 done via the new watchdog test. I can keep the debug output, but the
 watchdog test of mayday looks obsolete to me. Am I missing
 something?

 The watchdog does two things: it first sends a SIGDEBUG, then if the
 application is still spinning, it sends a SIGSEGV. As far as I
 understood, you test tests the first case, and mayday tests the
 second
 case, so, I agree that mayday should be removed, but whatever it
 tests
 should be integrated in the sigdebug test.


 Err... SIGSEGV is not a feature, it was the bug I fixed today. :)
 So the
 test case actually specified a bug as correct behavior.

 The fallback case is in fact killing the RT task as before. But I'm
 unsure right now: will this leave the system always in a clean state
 behind?

 The test case being a test case and doing nothing particular, I do not
 see what could go wrong. And if something goes wrong, then it needs
 fixing.

 Well, if you kill a RT task while it's running in the kernel, you risk
 inconsistent system states (held mutexex etc.). In this case the task is
 supposed to spin in user space. If that is always safe, let's implement
 the test.

 Had a closer look: These days the two-stage killing is only useful to
 catch endless loops in the kernel. User space tasks can't get around
 being migrated on watchdog events, even when SIGDEBUG is ignored.

 To trigger the enforced task termination without leaving any broken
 states behind, there is one option: rt_task_spin. Surprisingly for me,
 it actually spins in the kernel, thus triggers the second level if
 waiting long enough. I wonder, though, if that behavior shouldn't be
 improved, ie. the spinning loop be closed in user space - which would
 take away that option again.

 Thoughts?

 
 Tick-based timing is going to be the problem for determining the
 spinning delay, unless we expose it in the vdso on a per-skin basis,
 which won't be pretty.

I see. But we should possibly add some signal-pending || amok test to
that kernel loop. That would also kill my test design, but it makes
otherwise some sense I guess.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-26 13:56, Jan Kiszka wrote:
 To trigger the enforced task termination without leaving any broken
 states behind, there is one option: rt_task_spin. Surprisingly for me,
 it actually spins in the kernel, thus triggers the second level if
 waiting long enough. I wonder, though, if that behavior shouldn't be
 improved, ie. the spinning loop be closed in user space - which would
 take away that option again.

 Thoughts?


 Tick-based timing is going to be the problem for determining the
 spinning delay, unless we expose it in the vdso on a per-skin basis,
 which won't be pretty.
 
 I see. But we should possibly add some signal-pending || amok test to
 that kernel loop. That would also kill my test design, but it makes
 otherwise some sense I guess.

I'm thinking of a change like this:

diff --git a/ksrc/skins/native/syscall.c b/ksrc/skins/native/syscall.c
index acf0375..39204b4 100644
--- a/ksrc/skins/native/syscall.c
+++ b/ksrc/skins/native/syscall.c
@@ -1020,13 +1020,21 @@ static int __rt_timer_inquire(struct pt_regs *regs)
 
 static int __rt_timer_spin(struct pt_regs *regs)
 {
+   xnthread_t *thread = xnpod_current_thread();
+   struct task_struct *p = current;
+   RTIME etime;
RTIME ns;
 
if (__xn_safe_copy_from_user(ns, (void __user *)__xn_reg_arg1(regs),
 sizeof(ns)))
return -EFAULT;
 
-   rt_timer_spin(ns);
+   etime = xnarch_get_cpu_tsc() + xnarch_ns_to_tsc(ns);
+   while ((SRTIME)(xnarch_get_cpu_tsc() - etime)  0) {
+   if (signal_pending(p) || xnthread_amok_p(thread))
+   return -EINTR;
+   cpu_relax();
+   }
 
return 0;
 }

Regarding testability of the second watchdog state: We could add a
syscall to sigtest_module e.g which has the current rt_timer_spin
semantics.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-26 15:55, Gilles Chanteperdrix wrote:
 On 01/26/2012 11:36 AM, Jan Kiszka wrote:
 On 2012-01-25 19:05, Jan Kiszka wrote:
 On 2012-01-25 18:44, Gilles Chanteperdrix wrote:
 On 01/25/2012 06:10 PM, Jan Kiszka wrote:
 On 2012-01-25 18:02, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:52 PM, Jan Kiszka wrote:
 On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

 What are unit test for (as they are defined here)? Looks a bit 
 inconsistent.

 I put under regression all the tests I have which corresponded to
 things that failed one time or another in xenomai past. Maybe we could
 move unit tests under regression.


 - we already have a regression test for the watchdog called mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the code 
 in
 the mayday page, a nice feature)

 It seems to have failed in that important last discipline. Need to 
 check
 why.

 Because it didn't check the page content for correctness. But that's now
 done via the new watchdog test. I can keep the debug output, but the
 watchdog test of mayday looks obsolete to me. Am I missing something?

 The watchdog does two things: it first sends a SIGDEBUG, then if the
 application is still spinning, it sends a SIGSEGV. As far as I
 understood, you test tests the first case, and mayday tests the second
 case, so, I agree that mayday should be removed, but whatever it tests
 should be integrated in the sigdebug test.


 Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the
 test case actually specified a bug as correct behavior.

 The fallback case is in fact killing the RT task as before. But I'm
 unsure right now: will this leave the system always in a clean state
 behind?

 The test case being a test case and doing nothing particular, I do not
 see what could go wrong. And if something goes wrong, then it needs fixing.

 Well, if you kill a RT task while it's running in the kernel, you risk
 inconsistent system states (held mutexex etc.). In this case the task is
 supposed to spin in user space. If that is always safe, let's implement
 the test.

 Had a closer look: These days the two-stage killing is only useful to
 catch endless loops in the kernel. User space tasks can't get around
 being migrated on watchdog events, even when SIGDEBUG is ignored.

 To trigger the enforced task termination without leaving any broken
 states behind, there is one option: rt_task_spin. Surprisingly for me,
 it actually spins in the kernel, thus triggers the second level if
 waiting long enough. I wonder, though, if that behavior shouldn't be
 improved, ie. the spinning loop be closed in user space - which would
 take away that option again.

 Thoughts?
 
 You can also call in an infinite loop, a xenomais syscall which causes a
 switch to primary mode, but fails.

Nope, we would be migrated to secondary on xnthread_amok_p when
returning to user mode. We need a true kernel loop.

Jan.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-26 15:36, Jan Kiszka wrote:
 Regarding testability of the second watchdog state: We could add a
 syscall to sigtest_module e.g which has the current rt_timer_spin
 semantics.

Do you think this makes sense? Or some other testing driver?
Then I would go that direction.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-26 Thread Jan Kiszka
On 2012-01-26 16:52, Gilles Chanteperdrix wrote:
 On 01/26/2012 04:06 PM, Jan Kiszka wrote:
 On 2012-01-26 15:55, Gilles Chanteperdrix wrote:
 On 01/26/2012 11:36 AM, Jan Kiszka wrote:
 On 2012-01-25 19:05, Jan Kiszka wrote:
 On 2012-01-25 18:44, Gilles Chanteperdrix wrote:
 On 01/25/2012 06:10 PM, Jan Kiszka wrote:
 On 2012-01-25 18:02, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:52 PM, Jan Kiszka wrote:
 On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 
 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

 What are unit test for (as they are defined here)? Looks a bit 
 inconsistent.

 I put under regression all the tests I have which corresponded to
 things that failed one time or another in xenomai past. Maybe we could
 move unit tests under regression.


 - we already have a regression test for the watchdog called 
 mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the 
 code in
 the mayday page, a nice feature)

 It seems to have failed in that important last discipline. Need to 
 check
 why.

 Because it didn't check the page content for correctness. But that's 
 now
 done via the new watchdog test. I can keep the debug output, but the
 watchdog test of mayday looks obsolete to me. Am I missing something?

 The watchdog does two things: it first sends a SIGDEBUG, then if the
 application is still spinning, it sends a SIGSEGV. As far as I
 understood, you test tests the first case, and mayday tests the second
 case, so, I agree that mayday should be removed, but whatever it tests
 should be integrated in the sigdebug test.


 Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the
 test case actually specified a bug as correct behavior.

 The fallback case is in fact killing the RT task as before. But I'm
 unsure right now: will this leave the system always in a clean state
 behind?

 The test case being a test case and doing nothing particular, I do not
 see what could go wrong. And if something goes wrong, then it needs 
 fixing.

 Well, if you kill a RT task while it's running in the kernel, you risk
 inconsistent system states (held mutexex etc.). In this case the task is
 supposed to spin in user space. If that is always safe, let's implement
 the test.

 Had a closer look: These days the two-stage killing is only useful to
 catch endless loops in the kernel. User space tasks can't get around
 being migrated on watchdog events, even when SIGDEBUG is ignored.

 To trigger the enforced task termination without leaving any broken
 states behind, there is one option: rt_task_spin. Surprisingly for me,
 it actually spins in the kernel, thus triggers the second level if
 waiting long enough. I wonder, though, if that behavior shouldn't be
 improved, ie. the spinning loop be closed in user space - which would
 take away that option again.

 Thoughts?

 You can also call in an infinite loop, a xenomais syscall which causes a
 switch to primary mode, but fails.

 Nope, we would be migrated to secondary on xnthread_amok_p when
 returning to user mode. We need a true kernel loop.
 
 But the loop will continue, and the next call to the syscall will cause
 the thread to re-switch to primary mode.

But the watchdog signal pending flag will be cleared on each
migration, thus the watchdog will never enter the second stage.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH 2.6] mayday: Fix code setup for x86 and blackfin

2012-01-25 Thread Jan Kiszka
The code structures on x86 were broken as the compiler aligned the
internal layout. The same may have happened on blackfin. Fix it by
applying a packed tag on the enclosing structures.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Haven't checked Xenomai 3 yet, it may be affected as well.

 include/asm-blackfin/bits/shadow.h |2 +-
 include/asm-x86/bits/shadow_32.h   |4 ++--
 include/asm-x86/bits/shadow_64.h   |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-blackfin/bits/shadow.h 
b/include/asm-blackfin/bits/shadow.h
index 3f7e4eb..58cfdc6 100644
--- a/include/asm-blackfin/bits/shadow.h
+++ b/include/asm-blackfin/bits/shadow.h
@@ -84,7 +84,7 @@ static inline void xnarch_setup_mayday_page(void *page)
 * We don't mess with ASTAT here, so no need to save/restore
 * it in handle/fixup code.
 */
-   static const struct {
+   static const struct __attribute__ ((__packed__)) {
struct __attribute__ ((__packed__)) {
u16 op;
u16 imm;
diff --git a/include/asm-x86/bits/shadow_32.h b/include/asm-x86/bits/shadow_32.h
index b7a0e87..43fc2d0 100644
--- a/include/asm-x86/bits/shadow_32.h
+++ b/include/asm-x86/bits/shadow_32.h
@@ -70,7 +70,7 @@ static inline void xnarch_setup_mayday_page(void *page)
 * Also note that if SEP is present, we always assume NPTL on
 * the user side.
 */
-   static const struct {
+   static const struct __attribute__ ((__packed__)) {
struct __attribute__ ((__packed__)) {
u8 op;
u32 imm;
@@ -94,7 +94,7 @@ static inline void xnarch_setup_mayday_page(void *page)
.bug = 0x0b0f,
};
 
-   static const struct {
+   static const struct __attribute__ ((__packed__)) {
struct __attribute__ ((__packed__)) {
u8 op;
u32 imm;
diff --git a/include/asm-x86/bits/shadow_64.h b/include/asm-x86/bits/shadow_64.h
index fc90b9e..3fa6473 100644
--- a/include/asm-x86/bits/shadow_64.h
+++ b/include/asm-x86/bits/shadow_64.h
@@ -66,7 +66,7 @@ static inline void xnarch_setup_mayday_page(void *page)
 * We intentionally don't mess with EFLAGS here, so that we
 * don't have to save/restore it in handle/fixup code.
 */
-   static const struct {
+   static const struct __attribute__ ((__packed__)) {
struct __attribute__ ((__packed__)) {
u8 op;
u32 imm;
-- 
1.7.3.4

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-25 Thread Jan Kiszka
We had two regressions in this code recently. So test all 6 possible
SIGDEBUG reasons, or 5 if the watchdog is not available.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 src/testsuite/unit/Makefile.am |   16 +++-
 src/testsuite/unit/sigdebug.c  |  233 
 2 files changed, 248 insertions(+), 1 deletions(-)
 create mode 100644 src/testsuite/unit/sigdebug.c

diff --git a/src/testsuite/unit/Makefile.am b/src/testsuite/unit/Makefile.am
index 1bf5d8d..6e8203d 100644
--- a/src/testsuite/unit/Makefile.am
+++ b/src/testsuite/unit/Makefile.am
@@ -11,7 +11,8 @@ test_PROGRAMS = \
cond-torture-posix \
cond-torture-native \
check-vdso \
-   rtdm
+   rtdm \
+   sigdebug
 
 arith_SOURCES = arith.c arith-noinline.c arith-noinline.h
 
@@ -119,3 +120,16 @@ rtdm_LDADD = \
../../skins/native/libnative.la \
../../skins/common/libxenomai.la \
-lpthread -lrt -lm
+
+sigdebug_SOURCES = sigdebug.c
+
+sigdebug_CPPFLAGS = \
+   @XENO_USER_CFLAGS@ \
+   -I$(top_srcdir)/include
+
+sigdebug_LDFLAGS = @XENO_USER_LDFLAGS@
+
+sigdebug_LDADD = \
+   ../../skins/native/libnative.la \
+   ../../skins/common/libxenomai.la \
+   -lpthread -lm
diff --git a/src/testsuite/unit/sigdebug.c b/src/testsuite/unit/sigdebug.c
new file mode 100644
index 000..57d9beb
--- /dev/null
+++ b/src/testsuite/unit/sigdebug.c
@@ -0,0 +1,233 @@
+/*
+ * Functional testing of unwanted domain switch debugging mechanism.
+ *
+ * Copyright (C) Jan Kiszka  jan.kis...@siemens.com
+ *
+ * Released under the terms of GPLv2.
+ */
+
+#include unistd.h
+#include stdlib.h
+#include stdio.h
+#include stdbool.h
+#include string.h
+#include signal.h
+#include fcntl.h
+#include sys/mman.h
+#include pthread.h
+#include rtdk.h
+#include native/task.h
+#include native/mutex.h
+#include native/sem.h
+#include native/timer.h
+
+#define WRITE_TEST_SIZE(4*1024)
+
+unsigned int expected_reason;
+bool sigdebug_received;
+pthread_t rt_task_thread;
+RT_MUTEX prio_invert;
+RT_SEM send_signal;
+char *mem;
+FILE *wd;
+
+static void setup_checkdebug(unsigned int reason)
+{
+   sigdebug_received = false;
+   expected_reason = reason;
+}
+
+static void check_inner(const char *fn, int line, const char *msg,
+   int status, int expected)
+{
+   if (status == expected)
+   return;
+
+   rt_task_set_mode(T_WARNSW, 0, NULL);
+   rt_print_flush_buffers();
+   fprintf(stderr, FAILURE %s:%d: %s returned %d instead of %d - %s\n,
+   fn, line, msg, status, expected, strerror(-status));
+   exit(EXIT_FAILURE);
+}
+
+static void check_sigdebug_inner(const char *fn, int line, const char *reason)
+{
+   if (sigdebug_received)
+   return;
+
+   rt_task_set_mode(T_WARNSW, 0, NULL);
+   rt_print_flush_buffers();
+   fprintf(stderr, FAILURE %s:%d: no %s received\n, fn, line, reason);
+   exit(EXIT_FAILURE);
+}
+
+#define check(msg, status, expected) ({
\
+   int __status = status;  \
+   check_inner(__FUNCTION__, __LINE__, msg, __status, expected);   \
+   __status;   \
+})
+
+#define check_no_error(msg, status) ({ \
+   int __status = status;  \
+   check_inner(__FUNCTION__, __LINE__, msg,\
+   __status  0 ? __status : 0, 0);\
+   __status;   \
+})
+
+#define check_sigdebug_received(reason) do {   \
+   const char *__reason = reason;  \
+   check_sigdebug_inner(__FUNCTION__, __LINE__, __reason); \
+} while (0)
+
+void rt_task_body(void *cookie)
+{
+   RTIME end;
+   int err;
+
+   rt_task_thread = pthread_self();
+
+   rt_printf(syscall\n);
+   setup_checkdebug(SIGDEBUG_MIGRATE_SYSCALL);
+   syscall(-1);
+   check_sigdebug_received(SIGDEBUG_MIGRATE_SYSCALL);
+
+   rt_printf(signal\n);
+   setup_checkdebug(SIGDEBUG_MIGRATE_SIGNAL);
+   err = rt_sem_v(send_signal);
+   check_no_error(rt_sem_v, err);
+   rt_task_sleep(1000LL);
+   check_sigdebug_received(SIGDEBUG_MIGRATE_SIGNAL);
+
+   rt_printf(relaxed mutex owner\n);
+   setup_checkdebug(SIGDEBUG_MIGRATE_PRIOINV);
+   err = rt_mutex_acquire(prio_invert, TM_INFINITE);
+   check(rt_mutex_acquire, err, -EINTR);
+   check_sigdebug_received(SIGDEBUG_MIGRATE_PRIOINV);
+
+
+   rt_printf(page fault\n);
+   setup_checkdebug(SIGDEBUG_MIGRATE_FAULT);
+   rt_task_sleep(0);
+   *mem ^= 0xFF;
+   check_sigdebug_received(SIGDEBUG_MIGRATE_FAULT);
+
+   if (wd) {
+   rt_printf(watchdog\n

Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-25 Thread Jan Kiszka
On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.
 
 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

What are unit test for (as they are defined here)? Looks a bit inconsistent.

 - we already have a regression test for the watchdog called mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the code in
 the mayday page, a nice feature)

It seems to have failed in that important last discipline. Need to check
why.

 - please make the watchdog test mandatory by default (adding a command
 line option to skip it for instance), the test should fail if the
 watchdog is not enabled, because otherwise, it will be easy to forget
 testing this feature. The wathdog is enabled by default with xenomai 2.6
 anyway.

OK.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-25 Thread Jan Kiszka
On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test
 
 What are unit test for (as they are defined here)? Looks a bit inconsistent.
 
 - we already have a regression test for the watchdog called mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the code in
 the mayday page, a nice feature)
 
 It seems to have failed in that important last discipline. Need to check
 why.

Because it didn't check the page content for correctness. But that's now
done via the new watchdog test. I can keep the debug output, but the
watchdog test of mayday looks obsolete to me. Am I missing something?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] Add sigdebug unit test

2012-01-25 Thread Jan Kiszka
On 2012-01-25 18:44, Gilles Chanteperdrix wrote:
 On 01/25/2012 06:10 PM, Jan Kiszka wrote:
 On 2012-01-25 18:02, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:52 PM, Jan Kiszka wrote:
 On 2012-01-25 17:47, Jan Kiszka wrote:
 On 2012-01-25 17:35, Gilles Chanteperdrix wrote:
 On 01/25/2012 05:21 PM, Jan Kiszka wrote:
 We had two regressions in this code recently. So test all 6 possible
 SIGDEBUG reasons, or 5 if the watchdog is not available.

 Ok for this test, with a few remarks:
 - this is a regression test, so should go to
 src/testsuite/regression(/native), and should be added to the
 xeno-regression-test

 What are unit test for (as they are defined here)? Looks a bit 
 inconsistent.

 I put under regression all the tests I have which corresponded to
 things that failed one time or another in xenomai past. Maybe we could
 move unit tests under regression.


 - we already have a regression test for the watchdog called mayday.c,
 which tests the second watchdog action, please merge mayday.c with
 sigdebug.c (mayday.c also allows checking the disassembly of the code in
 the mayday page, a nice feature)

 It seems to have failed in that important last discipline. Need to check
 why.

 Because it didn't check the page content for correctness. But that's now
 done via the new watchdog test. I can keep the debug output, but the
 watchdog test of mayday looks obsolete to me. Am I missing something?

 The watchdog does two things: it first sends a SIGDEBUG, then if the
 application is still spinning, it sends a SIGSEGV. As far as I
 understood, you test tests the first case, and mayday tests the second
 case, so, I agree that mayday should be removed, but whatever it tests
 should be integrated in the sigdebug test.


 Err... SIGSEGV is not a feature, it was the bug I fixed today. :) So the
 test case actually specified a bug as correct behavior.

 The fallback case is in fact killing the RT task as before. But I'm
 unsure right now: will this leave the system always in a clean state
 behind?
 
 The test case being a test case and doing nothing particular, I do not
 see what could go wrong. And if something goes wrong, then it needs fixing.

Well, if you kill a RT task while it's running in the kernel, you risk
inconsistent system states (held mutexex etc.). In this case the task is
supposed to spin in user space. If that is always safe, let's implement
the test.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH] nucleus: Fix relaxed owner check

2012-01-19 Thread Jan Kiszka
Fix a logic inversion introduced by b75cec1938. This both allows the
relaxed-owner check to work again and prevents that we enter an endless
signal storm if XNSWREP happens to be set already at this point (as seen
in the field).

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 ksrc/nucleus/synch.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
index 47bc0c5..8daf7f6 100644
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -1004,7 +1004,7 @@ EXPORT_SYMBOL_GPL(xnsynch_release_all_ownerships);
 void xnsynch_detect_relaxed_owner(struct xnsynch *synch, struct xnthread 
*sleeper)
 {
if (xnthread_test_state(sleeper, XNTRAPSW) 
-   xnthread_test_info(sleeper, XNSWREP) 
+   !xnthread_test_info(sleeper, XNSWREP) 
xnthread_test_state(synch-owner, XNRELAX)) {
xnthread_set_info(sleeper, XNSWREP);
xnshadow_send_sig(sleeper, SIGDEBUG,
-- 
1.7.3.4

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] KVM awareness / tiny cleanup

2011-12-10 Thread Jan Kiszka
On 2011-10-14 13:20, Jan Kiszka wrote:
 The following changes since commit 31bdadef9018de586bc3fe8de0f37b62b2507785:
 
   testsuite: fix regression tests location (2011-10-14 01:46:08 +0200)
 
 are available in the git repository at:
   git://git.xenomai.org/xenomai-jki.git for-upstream
 
 Jan Kiszka (2):
   nucleus: Privatize __xnpod_reset_thread
   x86: Fire root preemption notifier
 
  include/asm-generic/hal.h |5 +
  include/asm-x86/bits/pod_32.h |2 ++
  include/asm-x86/bits/pod_64.h |2 ++
  include/nucleus/pod.h |2 --
  ksrc/nucleus/pod.c|8 
  5 files changed, 13 insertions(+), 6 deletions(-)
 
 Two trivial changes, though the second one actually adds a noteworthy
 feature to Xenomai: co-location of KVM guests on x86. That feature
 depends on I-pipe changes posted in [1].
 
 Jan
 
 [1] http://thread.gmane.org/gmane.linux.kernel.adeos.general/1898
 

Ping?

I've just pushed a trivial rebase.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PULL] KVM awareness / tiny cleanup

2011-10-14 Thread Jan Kiszka
The following changes since commit 31bdadef9018de586bc3fe8de0f37b62b2507785:

  testsuite: fix regression tests location (2011-10-14 01:46:08 +0200)

are available in the git repository at:
  git://git.xenomai.org/xenomai-jki.git for-upstream

Jan Kiszka (2):
  nucleus: Privatize __xnpod_reset_thread
  x86: Fire root preemption notifier

 include/asm-generic/hal.h |5 +
 include/asm-x86/bits/pod_32.h |2 ++
 include/asm-x86/bits/pod_64.h |2 ++
 include/nucleus/pod.h |2 --
 ksrc/nucleus/pod.c|8 
 5 files changed, 13 insertions(+), 6 deletions(-)

Two trivial changes, though the second one actually adds a noteworthy
feature to Xenomai: co-location of KVM guests on x86. That feature
depends on I-pipe changes posted in [1].

Jan

[1] http://thread.gmane.org/gmane.linux.kernel.adeos.general/1898

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] KVM awareness / tiny cleanup

2011-10-14 Thread Jan Kiszka
On 2011-10-14 14:35, Gilles Chanteperdrix wrote:
 On 10/14/2011 01:20 PM, Jan Kiszka wrote:
 The following changes since commit 31bdadef9018de586bc3fe8de0f37b62b2507785:

   testsuite: fix regression tests location (2011-10-14 01:46:08 +0200)

 are available in the git repository at:
   git://git.xenomai.org/xenomai-jki.git for-upstream

 Jan Kiszka (2):
   nucleus: Privatize __xnpod_reset_thread
 
 Are we sure that we do not have any out-of-tree users for this feature?

Out-of-tree would be impossible as that function was not exported.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-28 Thread Jan Kiszka
On 2011-09-28 10:16, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 07:25:07PM +0200, Jan Kiszka wrote:

 It manages buffers for you, provides NIC drivers and interfaces to
 either build the higher protocol layers in the kernel or in user space.
 That's the mission, but I would not exclude that there is room for
 improvements (lacking safe copy-to/from user, unneeded RX thread for
 single-user scenarios and possibly more). Still, the Ethercat master
 library folks chose it back then as a platform, maybe you want to ask them.
 
 Getting a little off topic, I wonder, who are these Ethercat folks of
 whom you speak?
 
 I do know of a few open source implementations, but none are based
 Xenomai:
 
 * IgH etherlab
   This is a complete stack in the kernel. Although they claim it works
   with Xenomai, in fact it does not, since it uses regular kernel spin
   locks, etc. However, it could be adapted to work with Xenomai.
 * SOEM
   This is a user space (really simple) stack based on normal raw
   sockets. It could also be adapted to use Xenomai, by adding some
   sort of raw RT socket.
 * OSADL
   This was withdrawn because of license concerns. I never saw the
   code, but I do beleive it was a user space solution using standard
   sockets.
 * Another?
   There once was some C++ program from some institute in the
   Netherlands (or Belgium? can't remember), but it was also withdrawn.
 
 So, did you mean any of these, or yet another implementation?

I meat the first hit for ethercat master library in a search engine. See

http://ethercatmaster.berlios.de/
http://www.fmtc.be/downloads/15_FMTC%20open%20sources.pdf

or ask Peter or Klaas (pe...@thesourceworks.com, klaas.gade...@fmtc.be -
not sure if Klaas is still with fmtc).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-26 13:41, Richard Cochran wrote:
 On Fri, Sep 23, 2011 at 03:50:42PM +0200, Jan Kiszka wrote:
 On 2011-09-23 13:02, Richard Cochran wrote:
 This patch adds a class driver for raw Ethernet drivers under
 Xenomai. The goal is to support industrial protocols such as EtherCAT
 and IEC 61850, where the stack is a user space program needing
 direct access at the packet level. The class driver offers interfaces
 for registration, buffer management, and packet sending/receiving.

 Although this patch is a kind of first draft, still I have it working
 on the Freescale P2020 with a real world application, with very good
 results. I can post a patch series for the gianfar driver in the ipipe
 tree, if anyone is interested.

 The user space interface is a character device and not a socket, simply
 because my applications will probably never need fancy socket
 options. The class driver could surely be made to offer a socket
 instead, but I think the character is sufficient.

 Many interesting interfaces already exists for standard PF_PACKET (e.g.
 time stamping), plus you gain portability this way. So let's not create
 something special here even if it's sufficient for the first use case.
 
 Okay, so if the raw packet driver idea finds acceptance (outside of
 rtnet), then I am willing to recast the thing as a socket.
  

 The class driver is clearly in the wrong directory within the source
 tree, but I put it there just to get started. It really does not fit
 to any of the other drivers, so it probably would need its own place
 under ksrc/drivers.

 New class, new directory.

 However, the key question is how this approach relates to RTnet. Right
 now its likely comparing apples to onions, but that may change as things
 evolve in the right direction.

 Can you explain a bit about your NIC driver architecture and the
 maintenance strategy? It looks like you are taking an approach of
 patching existing drivers in-tree.
 
 Yes, that is right.
 
 Anything that resolves the
 maintenance pain we have with the RTnet model is already worth
 considering. Did you take this approach intentionally? What pros and
 cons do you see?
 
 So, here is my story. Many months ago I needed a Xenomai program to
 send and receive raw packets. I looked at rtnet and decided that it
 was way too complicated. So, I added a rtdm character device to the
 Linux Ethernet MAC driver, a rather simple hack:
 
  drivers/net/arm/Makefile |4 +
  drivers/net/arm/ixp4xx_eth.c |  316
  +-
  2 files changed, 319 insertions(+), 1 deletions(-)
 
 That worked great. I did not bother to implement more than read/write,
 but I used a normal socket to set the multicast Rx list.
 
 Now, I needed to port that application to a new platform. Again, I
 looked at rtnet (and the recently posted gianfar code), and again I
 came to the conclusion that I was better off with a direct driver
 hack.
 
 Most of those 316 additional lines, above, were rtdm driver boiler
 plate, and I this time I wanted to abstract the code that will surely
 be the same, like device registration, user space send/receive, and
 buffer management. (Because the eTSEC (gianfar) is much more complex,
 and because I wanted hardware packet classification, the driver patch
 is much larger than before.)
 
 I admire the rtnet project. Based on my limited understanding of it,
 if I really needed deterministic Ethernet, then I would write a brand
 new driver for the eTSEC (perhaps using gianfar.c as a reference), but
 I would *not* try and adapt the Linux driver.
 
 Conversely, if I were writing a brand new driver, I would surely offer
 it as an rtnet driver, even thought I only need raw packets.
 
 However, for the sole requirement of raw Ethernet, I think my simple
 driver API is much easier to work into existing drivers. I also think
 it is way easier to maintain by rebasing on later changes.
 
 IMHO, the pros of my approach are:
 
 - Simple to implement new drivers.
 
   Compare my rtpacket.h with the rtnet driver headers to see what I
   mean. Or, read rtnet/Documentation/README.drvporting and ask
   yourself, is it easy to port a driver to rtnet?

I would be careful with deriving generic properties from a potentially
lucky first example. Already tried to apply your pattern on a standard
PCI NIC, e.g. the common Intel 8257x series?

 
 - Easier to maintain drivers.
 
   Making regular drivers into real time drivers will always be a
   chore. But, with its simple interface, the packet class driver hack
   is way less painful. (Look at my gianfar example. There really are
   not many changes to the Linux driver.)

What is your simple interface? That's what I'm interested in. What is
the pattern to apply on an arbitrary driver to add RT support? How does
interface claiming work (so the RT is not conflicting with Linux)? How
does the configuration work? How do you deal with things like watchdogs,
error handling, IRQ coalescing avoidance, etc.? Hmm

Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 14:01, Richard Cochran wrote:
 Again, every MAC driver needs to be tastefully and wisely adapted. I
 don't necessarily need to avoid coalescing. The goal (for me) is *not*
 to provide deterministic Ethernet performance. Instead the RT packets
 should just be delivered ASAP.

This is obviously the point I completely missed. And it makes the whole
thing fairly uninteresting IMHO. If you want to do Ethercat, PowerLink
or Profinet (RT), you do need a certain level of determinism along the
*whole* packet path. And for the latter two, you definitely need RT IRQ
support, Ethercat can be OK to poll in fast setups.

From that POV, your approach is likely OK. But I doubt its of generic
use, specifically for industrial RT Ethernet.

Jan

PS: You do have a stack, even if you don't like it: driver, packet
layer, application. :)

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 17:10, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 02:20:43PM +0200, Jan Kiszka wrote:
 On 2011-09-27 14:01, Richard Cochran wrote:
 Again, every MAC driver needs to be tastefully and wisely adapted. I
 don't necessarily need to avoid coalescing. The goal (for me) is *not*
 to provide deterministic Ethernet performance. Instead the RT packets
 should just be delivered ASAP.

 This is obviously the point I completely missed. And it makes the whole
 thing fairly uninteresting IMHO. If you want to do Ethercat, PowerLink
 or Profinet (RT), you do need a certain level of determinism along the
 *whole* packet path. And for the latter two, you definitely need RT IRQ
 support, Ethercat can be OK to poll in fast setups.

 From that POV, your approach is likely OK. But I doubt its of generic
 use, specifically for industrial RT Ethernet.
 
 So, how does rtnet support EtherCAT?

There was once the EtherCAT Master Library. IIRC, it was discontinued
and removed from the web for non-technical reasons.

 
 Does it support PowerLink and Profinet?

Not that I know, but that's not the point. You said your approach could
provide the technical foundation for such a class of use cases while I
doubt it would work as is.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 18:05, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 05:16:09PM +0200, Jan Kiszka wrote:
 On 2011-09-27 17:10, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 02:20:43PM +0200, Jan Kiszka wrote:
 On 2011-09-27 14:01, Richard Cochran wrote:
 Again, every MAC driver needs to be tastefully and wisely adapted. I
 don't necessarily need to avoid coalescing. The goal (for me) is *not*
 to provide deterministic Ethernet performance. Instead the RT packets
 should just be delivered ASAP.

 This is obviously the point I completely missed. And it makes the whole
 thing fairly uninteresting IMHO. If you want to do Ethercat, PowerLink
 or Profinet (RT), you do need a certain level of determinism along the
 *whole* packet path. And for the latter two, you definitely need RT IRQ
 support, Ethercat can be OK to poll in fast setups.

 From that POV, your approach is likely OK. But I doubt its of generic
 use, specifically for industrial RT Ethernet.

 So, how does rtnet support EtherCAT?

 There was once the EtherCAT Master Library. IIRC, it was discontinued
 and removed from the web for non-technical reasons.


 Does it support PowerLink and Profinet?

 Not that I know, but that's not the point. You said your approach could
 provide the technical foundation for such a class of use cases while I
 doubt it would work as is.
 
 But that is the point. Correct me please if I am wrong, but isn't
 rtnet a competitor or alternative to EtherCAT, PowerLink, and
 Profinet?

That's a common misunderstanding: RTnet is a networking stack with many
_optional_ components (like RTmac, RTcfg etc.). I would bet that it's
more frequently used today in minimal setups, i.e. just the core, some
driver, and either PF_PACKET or UDP/IP.

 
 Unless rtnet implements (or helps to implement) these, it is kind of
 silly to say, your way won't work, you should use rtnet instead.
 
 I don't know PowerLink or Profinet, but I do know EtherCAT and IEC
 61850, and those two can surely be implemented on the interface that I
 am talking about.

It works, but it won't give you a deterministic control loop as you
still have Linux in the game.

 
 Also, the interface that I propose does not preclude the use of RTDM
 ISR in the driver, so I wonder what exactly bothers you about it?

That you claim it's simpler based on lacking features - like RT IRQ
support and the associated proper driver integration of that. Maybe it
turns out to be simpler, but that is by far proven yet.

I was simply hoping to collect some new ideas how to address the driver
maintenance issue in a better way but without dropping key features
needed for RT networking. Something like let's add generic RT channels
to Linux upstream drivers and then only patch them fit RTDM. Not sure
if that works, but it would come with a vision how to keep things more
maintainable.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 18:26, Jan Kiszka wrote:
 I was simply hoping to collect some new ideas how to address the driver
 maintenance issue in a better way but without dropping key features
 needed for RT networking. Something like let's add generic RT channels
 to Linux upstream drivers and then only patch them fit RTDM. Not sure
 if that works, but it would come with a vision how to keep things more
 maintainable.

+this could be useful for other scenarios - on PREEMPT-RT.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 19:00, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 06:26:44PM +0200, Jan Kiszka wrote:
 On 2011-09-27 18:05, Richard Cochran wrote:

 That's a common misunderstanding: RTnet is a networking stack with many
 _optional_ components (like RTmac, RTcfg etc.). I would bet that it's
 more frequently used today in minimal setups, i.e. just the core, some
 driver, and either PF_PACKET or UDP/IP.
 
 I understood about the modular design, but I really want to know if
 rtnet will help me if I want to use of the industrial Ethernet
 protocols. AFAICT, rtnet really doesn't offer these.
 
 So I'll ask the direct question once again. Does rtnet help me with
 industrial Ethernet (apart from the rtnet protocols), or not?

It manages buffers for you, provides NIC drivers and interfaces to
either build the higher protocol layers in the kernel or in user space.
That's the mission, but I would not exclude that there is room for
improvements (lacking safe copy-to/from user, unneeded RX thread for
single-user scenarios and possibly more). Still, the Ethercat master
library folks chose it back then as a platform, maybe you want to ask them.

 
 Unless rtnet implements (or helps to implement) these, it is kind of
 silly to say, your way won't work, you should use rtnet instead.

 I don't know PowerLink or Profinet, but I do know EtherCAT and IEC
 61850, and those two can surely be implemented on the interface that I
 am talking about.

 It works, but it won't give you a deterministic control loop as you
 still have Linux in the game.
 
 It really depends on how the driver is written. While my gianfar
 example does make use of normal Linux driver interrupts, it would not
 necessarily have to do so.
 
 I was simply hoping to collect some new ideas how to address the driver
 maintenance issue in a better way but without dropping key features
 needed for RT networking. Something like let's add generic RT channels
 to Linux upstream drivers and then only patch them fit RTDM. Not sure
 if that works, but it would come with a vision how to keep things more
 maintainable.
 
 Well, can you turn the issue around and convince me that writing a
 rtnet driver is the best way to acheive raw Ethernet packet access?

It would at least avoid having to reinvent user interfaces and buffer
management layers. They may appear simple now, but that's how everything
once started.

 
 You talk about the rtnet driver model, but is it described anywhere?
 
 (BTW rtnet/Documentation/README.drvporting is horrible. It is just a
 random list of 40+ odd points without any sense. That document gave me
 the impression that developing an rtnet driver is a kind of extended
 hack until it starts working.)

I know best. It once worked well, allowed you to more or less
mechanically convert a driver within a few hours, but that's quite a few
years ago. Driver complexity almost exploded since then.

But my goal is not necessarily convincing you to use current RTnet as
is, but to make you think ahead, reusing the experience of this project
if you start something new. That need not replace RTnet immediately, but
it should not block a transition architecturally. The first step to this
remains studying the legacy, even if not using it in the end.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-27 Thread Jan Kiszka
On 2011-09-27 19:04, Richard Cochran wrote:
 On Tue, Sep 27, 2011 at 06:30:00PM +0200, Jan Kiszka wrote:
 On 2011-09-27 18:26, Jan Kiszka wrote:
 I was simply hoping to collect some new ideas how to address the driver
 maintenance issue in a better way but without dropping key features
 needed for RT networking. Something like let's add generic RT channels
 to Linux upstream drivers and then only patch them fit RTDM. Not sure
 if that works, but it would come with a vision how to keep things more
 maintainable.

 +this could be useful for other scenarios - on PREEMPT-RT.
 
 (But PREEMPT-RT will make the whole kernel deterministic, right? ;)

Yes. Except where not.

 
 Adding low-latency channels (eg. working against coalescing) will be a
 very hard sell upstream.

That also depends on the invasiveness. Key requirements like separate
packet pools slowly sneak in (for swap over net). Also, separate IRQs
for separate channels on modern NICs may make the split-out smoother -
you may no longer need to disable features that used to affect the same
handler or shared some locks.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-23 Thread Jan Kiszka
On 2011-09-23 13:02, Richard Cochran wrote:
 This patch adds a class driver for raw Ethernet drivers under
 Xenomai. The goal is to support industrial protocols such as EtherCAT
 and IEC 61850, where the stack is a user space program needing
 direct access at the packet level. The class driver offers interfaces
 for registration, buffer management, and packet sending/receiving.
 
 Although this patch is a kind of first draft, still I have it working
 on the Freescale P2020 with a real world application, with very good
 results. I can post a patch series for the gianfar driver in the ipipe
 tree, if anyone is interested.
 
 The user space interface is a character device and not a socket, simply
 because my applications will probably never need fancy socket
 options. The class driver could surely be made to offer a socket
 instead, but I think the character is sufficient.

Many interesting interfaces already exists for standard PF_PACKET (e.g.
time stamping), plus you gain portability this way. So let's not create
something special here even if it's sufficient for the first use case.

 
 The class driver is clearly in the wrong directory within the source
 tree, but I put it there just to get started. It really does not fit
 to any of the other drivers, so it probably would need its own place
 under ksrc/drivers.

New class, new directory.

However, the key question is how this approach relates to RTnet. Right
now its likely comparing apples to onions, but that may change as things
evolve in the right direction.

Can you explain a bit about your NIC driver architecture and the
maintenance strategy? It looks like you are taking an approach of
patching existing drivers in-tree. Anything that resolves the
maintenance pain we have with the RTnet model is already worth
considering. Did you take this approach intentionally? What pros and
cons do you see?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] P2020 support for ftrace with ipipe 2.12-01 and xeno 2.5.5.1

2011-09-23 Thread Jan Kiszka
On 2011-09-23 15:58, Jean-Michel Hautbois wrote:
 2011/9/23 Gilles Chanteperdrix gilles.chanteperd...@xenomai.org
 
 On 09/23/2011 11:49 AM, Jean-Michel Hautbois wrote:
 OK, I have more traces (a few :)) :

 I meant the I-pipe tracer alone. The I-pipe tracer intead of other
 ftrace tracers.


 Well, I think it works, I can do a cat /proc/ipipe/trace/max without any
 error in a kernel which doesn't have any other ftrace function.
 Do you have one test in particular in mind ?

Check
http://git.kiszka.org/?p=ipipe.git;a=shortlog;h=refs/heads/queues/2.6.35-x86-trace
to get an impression of what is required to get ftrace working on
x86_64. It is not working on x86_32 e.g. as it depends on the arch
providing a NMI-safe, ie. task-stack independent current_thread_info().
That is also not yet the case on Power.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-18 Thread Jan Kiszka
On 2011-09-18 16:02, Philippe Gerum wrote:
 On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote:
 On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote:
 On 09/11/2011 04:29 PM, Jan Kiszka wrote:
 On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
 On 09/11/2011 12:50 PM, Jan Kiszka wrote:
 Hi all,

 just looked into the hrescnt issue again, specifically the corner case
 of a shadow thread switching from real-time policy to SCHED_OTHER.

 Doing this while holding a mutex looks invalid.

 Looking at POSIX e.g., is there anything in the spec that makes this
 invalid? If the kernel preserves or established proper priority
 boosting, I do not see what could break in principle.

 It is nothing I would design into some app, but we should somehow handle
 it (doc update or code adjustments).

 If we do not do it, the current code is valid.

 Except for its dependency on XNOTHER which is not updated on RT-NORMAL
 transitions.

 The fact that this update did not take place made the code work. No 
 negative rescnt could happen with that code.

 Anyway, here is a patch to allow switching back from RT to NORMAL, but 
 send a SIGDEBUG to a thread attempting to release a mutex while its 
 counter is already 0. We end up avoiding a big chunk of code that would 
 have been useful for a really strange corner case.


 Here comes version 2:
 diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h
 index 6399a17..417170f 100644
 --- a/include/nucleus/sched-idle.h
 +++ b/include/nucleus/sched-idle.h
 @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle;
  static inline void __xnsched_idle_setparam(struct xnthread *thread,
 const union xnsched_policy_param *p)
  {
 +if (xnthread_test_state(thread, XNSHADOW))
 +xnthread_clear_state(thread, XNOTHER);
  thread-cprio = p-idle.prio;
  }
  
 diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
 index 71f655c..cc1cefa 100644
 --- a/include/nucleus/sched-rt.h
 +++ b/include/nucleus/sched-rt.h
 @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct xnthread 
 *thread,
   const union xnsched_policy_param *p)
  {
  thread-cprio = p-rt.prio;
 +if (xnthread_test_state(thread, XNSHADOW)) {
 +if (thread-cprio)
 +xnthread_clear_state(thread, XNOTHER);
 +else
 +xnthread_set_state(thread, XNOTHER);
 +}
  }
  
  static inline void __xnsched_rt_getparam(struct xnthread *thread,
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 9a02e80..d1f 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct xnthread 
 *thread,
  xnsched_putback(thread);
  
  #ifdef CONFIG_XENO_OPT_PERVASIVE
 -/*
 - * A non-real-time shadow may upgrade to real-time FIFO
 - * scheduling, but the latter may never downgrade to
 - * SCHED_NORMAL Xenomai-wise. In the valid case, we clear
 - * XNOTHER to reflect the change. Note that we keep handling
 - * non real-time shadow specifics in higher code layers, not
 - * to pollute the core scheduler with peculiarities.
 - */
 -if (sched_class == xnsched_class_rt  sched_param-rt.prio  0)
 -xnthread_clear_state(thread, XNOTHER);
  if (propagate) {
  if (xnthread_test_state(thread, XNRELAX))
  xnshadow_renice(thread);
 diff --git a/ksrc/nucleus/sched-sporadic.c b/ksrc/nucleus/sched-sporadic.c
 index fd37c21..ffc9bab 100644
 --- a/ksrc/nucleus/sched-sporadic.c
 +++ b/ksrc/nucleus/sched-sporadic.c
 @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct xnthread 
 *thread,
  }
  }
  
 +if (xnthread_test_state(thread, XNSHADOW))
 +xnthread_clear_state(thread, XNOTHER);
  thread-cprio = p-pss.current_prio;
  }
  
 diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c
 index 43a548e..a2af1d3 100644
 --- a/ksrc/nucleus/sched-tp.c
 +++ b/ksrc/nucleus/sched-tp.c
 @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread *thread,
  {
  struct xnsched *sched = thread-sched;
  
 +if (xnthread_test_state(thread, XNSHADOW))
 +xnthread_clear_state(thread, XNOTHER);
  thread-tps = sched-tp.partitions[p-tp.ptid];
  thread-cprio = p-tp.prio;
  }
 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index b956e46..47bc0c5 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch, struct 
 xnthread *lastowner)
  
  XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -if (xnthread_test_state(lastowner, XNOTHER))
 -xnthread_dec_rescnt(lastowner);
 -XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 +if (xnthread_test_state(lastowner, XNOTHER)) {
 +if (xnthread_get_rescnt(lastowner

Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-18 Thread Jan Kiszka
On 2011-09-18 17:10, Philippe Gerum wrote:
 On Sun, 2011-09-18 at 16:34 +0200, Jan Kiszka wrote:
 On 2011-09-18 16:02, Philippe Gerum wrote:
 On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote:
 On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote:
 On 09/11/2011 04:29 PM, Jan Kiszka wrote:
 On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
 On 09/11/2011 12:50 PM, Jan Kiszka wrote:
 Hi all,

 just looked into the hrescnt issue again, specifically the corner case
 of a shadow thread switching from real-time policy to SCHED_OTHER.

 Doing this while holding a mutex looks invalid.

 Looking at POSIX e.g., is there anything in the spec that makes this
 invalid? If the kernel preserves or established proper priority
 boosting, I do not see what could break in principle.

 It is nothing I would design into some app, but we should somehow handle
 it (doc update or code adjustments).

 If we do not do it, the current code is valid.

 Except for its dependency on XNOTHER which is not updated on RT-NORMAL
 transitions.

 The fact that this update did not take place made the code work. No 
 negative rescnt could happen with that code.

 Anyway, here is a patch to allow switching back from RT to NORMAL, but 
 send a SIGDEBUG to a thread attempting to release a mutex while its 
 counter is already 0. We end up avoiding a big chunk of code that would 
 have been useful for a really strange corner case.


 Here comes version 2:
 diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h
 index 6399a17..417170f 100644
 --- a/include/nucleus/sched-idle.h
 +++ b/include/nucleus/sched-idle.h
 @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle;
  static inline void __xnsched_idle_setparam(struct xnthread *thread,
   const union xnsched_policy_param *p)
  {
 +  if (xnthread_test_state(thread, XNSHADOW))
 +  xnthread_clear_state(thread, XNOTHER);
thread-cprio = p-idle.prio;
  }
  
 diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
 index 71f655c..cc1cefa 100644
 --- a/include/nucleus/sched-rt.h
 +++ b/include/nucleus/sched-rt.h
 @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct 
 xnthread *thread,
 const union xnsched_policy_param *p)
  {
thread-cprio = p-rt.prio;
 +  if (xnthread_test_state(thread, XNSHADOW)) {
 +  if (thread-cprio)
 +  xnthread_clear_state(thread, XNOTHER);
 +  else
 +  xnthread_set_state(thread, XNOTHER);
 +  }
  }
  
  static inline void __xnsched_rt_getparam(struct xnthread *thread,
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 9a02e80..d1f 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct xnthread 
 *thread,
xnsched_putback(thread);
  
  #ifdef CONFIG_XENO_OPT_PERVASIVE
 -  /*
 -   * A non-real-time shadow may upgrade to real-time FIFO
 -   * scheduling, but the latter may never downgrade to
 -   * SCHED_NORMAL Xenomai-wise. In the valid case, we clear
 -   * XNOTHER to reflect the change. Note that we keep handling
 -   * non real-time shadow specifics in higher code layers, not
 -   * to pollute the core scheduler with peculiarities.
 -   */
 -  if (sched_class == xnsched_class_rt  sched_param-rt.prio  0)
 -  xnthread_clear_state(thread, XNOTHER);
if (propagate) {
if (xnthread_test_state(thread, XNRELAX))
xnshadow_renice(thread);
 diff --git a/ksrc/nucleus/sched-sporadic.c b/ksrc/nucleus/sched-sporadic.c
 index fd37c21..ffc9bab 100644
 --- a/ksrc/nucleus/sched-sporadic.c
 +++ b/ksrc/nucleus/sched-sporadic.c
 @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct xnthread 
 *thread,
}
}
  
 +  if (xnthread_test_state(thread, XNSHADOW))
 +  xnthread_clear_state(thread, XNOTHER);
thread-cprio = p-pss.current_prio;
  }
  
 diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c
 index 43a548e..a2af1d3 100644
 --- a/ksrc/nucleus/sched-tp.c
 +++ b/ksrc/nucleus/sched-tp.c
 @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread 
 *thread,
  {
struct xnsched *sched = thread-sched;
  
 +  if (xnthread_test_state(thread, XNSHADOW))
 +  xnthread_clear_state(thread, XNOTHER);
thread-tps = sched-tp.partitions[p-tp.ptid];
thread-cprio = p-tp.prio;
  }
 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index b956e46..47bc0c5 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch, struct 
 xnthread *lastowner)
  
XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 -  if (xnthread_test_state(lastowner, XNOTHER))
 -  xnthread_dec_rescnt(lastowner);
 -  XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 +  if (xnthread_test_state(lastowner, XNOTHER

Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-18 Thread Jan Kiszka
On 2011-09-18 17:11, Jan Kiszka wrote:
 On 2011-09-18 17:10, Philippe Gerum wrote:
 On Sun, 2011-09-18 at 16:34 +0200, Jan Kiszka wrote:
 On 2011-09-18 16:02, Philippe Gerum wrote:
 On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote:
 On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote:
 On 09/11/2011 04:29 PM, Jan Kiszka wrote:
 On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
 On 09/11/2011 12:50 PM, Jan Kiszka wrote:
 Hi all,

 just looked into the hrescnt issue again, specifically the corner case
 of a shadow thread switching from real-time policy to SCHED_OTHER.

 Doing this while holding a mutex looks invalid.

 Looking at POSIX e.g., is there anything in the spec that makes this
 invalid? If the kernel preserves or established proper priority
 boosting, I do not see what could break in principle.

 It is nothing I would design into some app, but we should somehow handle
 it (doc update or code adjustments).

 If we do not do it, the current code is valid.

 Except for its dependency on XNOTHER which is not updated on RT-NORMAL
 transitions.

 The fact that this update did not take place made the code work. No 
 negative rescnt could happen with that code.

 Anyway, here is a patch to allow switching back from RT to NORMAL, but 
 send a SIGDEBUG to a thread attempting to release a mutex while its 
 counter is already 0. We end up avoiding a big chunk of code that would 
 have been useful for a really strange corner case.


 Here comes version 2:
 diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h
 index 6399a17..417170f 100644
 --- a/include/nucleus/sched-idle.h
 +++ b/include/nucleus/sched-idle.h
 @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle;
  static inline void __xnsched_idle_setparam(struct xnthread *thread,
  const union xnsched_policy_param *p)
  {
 + if (xnthread_test_state(thread, XNSHADOW))
 + xnthread_clear_state(thread, XNOTHER);
   thread-cprio = p-idle.prio;
  }
  
 diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
 index 71f655c..cc1cefa 100644
 --- a/include/nucleus/sched-rt.h
 +++ b/include/nucleus/sched-rt.h
 @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct 
 xnthread *thread,
const union xnsched_policy_param *p)
  {
   thread-cprio = p-rt.prio;
 + if (xnthread_test_state(thread, XNSHADOW)) {
 + if (thread-cprio)
 + xnthread_clear_state(thread, XNOTHER);
 + else
 + xnthread_set_state(thread, XNOTHER);
 + }
  }
  
  static inline void __xnsched_rt_getparam(struct xnthread *thread,
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 9a02e80..d1f 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct xnthread 
 *thread,
   xnsched_putback(thread);
  
  #ifdef CONFIG_XENO_OPT_PERVASIVE
 - /*
 -  * A non-real-time shadow may upgrade to real-time FIFO
 -  * scheduling, but the latter may never downgrade to
 -  * SCHED_NORMAL Xenomai-wise. In the valid case, we clear
 -  * XNOTHER to reflect the change. Note that we keep handling
 -  * non real-time shadow specifics in higher code layers, not
 -  * to pollute the core scheduler with peculiarities.
 -  */
 - if (sched_class == xnsched_class_rt  sched_param-rt.prio  0)
 - xnthread_clear_state(thread, XNOTHER);
   if (propagate) {
   if (xnthread_test_state(thread, XNRELAX))
   xnshadow_renice(thread);
 diff --git a/ksrc/nucleus/sched-sporadic.c b/ksrc/nucleus/sched-sporadic.c
 index fd37c21..ffc9bab 100644
 --- a/ksrc/nucleus/sched-sporadic.c
 +++ b/ksrc/nucleus/sched-sporadic.c
 @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct xnthread 
 *thread,
   }
   }
  
 + if (xnthread_test_state(thread, XNSHADOW))
 + xnthread_clear_state(thread, XNOTHER);
   thread-cprio = p-pss.current_prio;
  }
  
 diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c
 index 43a548e..a2af1d3 100644
 --- a/ksrc/nucleus/sched-tp.c
 +++ b/ksrc/nucleus/sched-tp.c
 @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread 
 *thread,
  {
   struct xnsched *sched = thread-sched;
  
 + if (xnthread_test_state(thread, XNSHADOW))
 + xnthread_clear_state(thread, XNOTHER);
   thread-tps = sched-tp.partitions[p-tp.ptid];
   thread-cprio = p-tp.prio;
  }
 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index b956e46..47bc0c5 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch, struct 
 xnthread *lastowner)
  
   XENO_BUGON(NUCLEUS, !testbits(synch-status, XNSYNCH_OWNER));
  
 - if (xnthread_test_state(lastowner, XNOTHER))
 - xnthread_dec_rescnt(lastowner);
 - XENO_BUGON(NUCLEUS, xnthread_get_rescnt(lastowner)  0);
 + if (xnthread_test_state(lastowner, XNOTHER

Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-18 Thread Jan Kiszka
On 2011-09-18 17:18, Gilles Chanteperdrix wrote:
 On 09/18/2011 05:13 PM, Jan Kiszka wrote:
 On 2011-09-18 17:11, Jan Kiszka wrote:
 On 2011-09-18 17:10, Philippe Gerum wrote:
 On Sun, 2011-09-18 at 16:34 +0200, Jan Kiszka wrote:
 On 2011-09-18 16:02, Philippe Gerum wrote:
 On Fri, 2011-09-16 at 22:39 +0200, Gilles Chanteperdrix wrote:
 On 09/16/2011 10:13 PM, Gilles Chanteperdrix wrote:
 On 09/11/2011 04:29 PM, Jan Kiszka wrote:
 On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
 On 09/11/2011 12:50 PM, Jan Kiszka wrote:
 Hi all,

 just looked into the hrescnt issue again, specifically the corner 
 case
 of a shadow thread switching from real-time policy to SCHED_OTHER.

 Doing this while holding a mutex looks invalid.

 Looking at POSIX e.g., is there anything in the spec that makes this
 invalid? If the kernel preserves or established proper priority
 boosting, I do not see what could break in principle.

 It is nothing I would design into some app, but we should somehow 
 handle
 it (doc update or code adjustments).

 If we do not do it, the current code is valid.

 Except for its dependency on XNOTHER which is not updated on 
 RT-NORMAL
 transitions.

 The fact that this update did not take place made the code work. No 
 negative rescnt could happen with that code.

 Anyway, here is a patch to allow switching back from RT to NORMAL, but 
 send a SIGDEBUG to a thread attempting to release a mutex while its 
 counter is already 0. We end up avoiding a big chunk of code that 
 would 
 have been useful for a really strange corner case.


 Here comes version 2:
 diff --git a/include/nucleus/sched-idle.h b/include/nucleus/sched-idle.h
 index 6399a17..417170f 100644
 --- a/include/nucleus/sched-idle.h
 +++ b/include/nucleus/sched-idle.h
 @@ -39,6 +39,8 @@ extern struct xnsched_class xnsched_class_idle;
  static inline void __xnsched_idle_setparam(struct xnthread *thread,
const union 
 xnsched_policy_param *p)
  {
 +   if (xnthread_test_state(thread, XNSHADOW))
 +   xnthread_clear_state(thread, XNOTHER);
 thread-cprio = p-idle.prio;
  }
  
 diff --git a/include/nucleus/sched-rt.h b/include/nucleus/sched-rt.h
 index 71f655c..cc1cefa 100644
 --- a/include/nucleus/sched-rt.h
 +++ b/include/nucleus/sched-rt.h
 @@ -86,6 +86,12 @@ static inline void __xnsched_rt_setparam(struct 
 xnthread *thread,
  const union 
 xnsched_policy_param *p)
  {
 thread-cprio = p-rt.prio;
 +   if (xnthread_test_state(thread, XNSHADOW)) {
 +   if (thread-cprio)
 +   xnthread_clear_state(thread, XNOTHER);
 +   else
 +   xnthread_set_state(thread, XNOTHER);
 +   }
  }
  
  static inline void __xnsched_rt_getparam(struct xnthread *thread,
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 9a02e80..d1f 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -1896,16 +1896,6 @@ int __xnpod_set_thread_schedparam(struct 
 xnthread *thread,
 xnsched_putback(thread);
  
  #ifdef CONFIG_XENO_OPT_PERVASIVE
 -   /*
 -* A non-real-time shadow may upgrade to real-time FIFO
 -* scheduling, but the latter may never downgrade to
 -* SCHED_NORMAL Xenomai-wise. In the valid case, we clear
 -* XNOTHER to reflect the change. Note that we keep handling
 -* non real-time shadow specifics in higher code layers, not
 -* to pollute the core scheduler with peculiarities.
 -*/
 -   if (sched_class == xnsched_class_rt  sched_param-rt.prio  
 0)
 -   xnthread_clear_state(thread, XNOTHER);
 if (propagate) {
 if (xnthread_test_state(thread, XNRELAX))
 xnshadow_renice(thread);
 diff --git a/ksrc/nucleus/sched-sporadic.c 
 b/ksrc/nucleus/sched-sporadic.c
 index fd37c21..ffc9bab 100644
 --- a/ksrc/nucleus/sched-sporadic.c
 +++ b/ksrc/nucleus/sched-sporadic.c
 @@ -258,6 +258,8 @@ static void xnsched_sporadic_setparam(struct 
 xnthread *thread,
 }
 }
  
 +   if (xnthread_test_state(thread, XNSHADOW))
 +   xnthread_clear_state(thread, XNOTHER);
 thread-cprio = p-pss.current_prio;
  }
  
 diff --git a/ksrc/nucleus/sched-tp.c b/ksrc/nucleus/sched-tp.c
 index 43a548e..a2af1d3 100644
 --- a/ksrc/nucleus/sched-tp.c
 +++ b/ksrc/nucleus/sched-tp.c
 @@ -100,6 +100,8 @@ static void xnsched_tp_setparam(struct xnthread 
 *thread,
  {
 struct xnsched *sched = thread-sched;
  
 +   if (xnthread_test_state(thread, XNSHADOW))
 +   xnthread_clear_state(thread, XNOTHER);
 thread-tps = sched-tp.partitions[p-tp.ptid];
 thread-cprio = p-tp.prio;
  }
 diff --git a/ksrc/nucleus/synch.c b/ksrc/nucleus/synch.c
 index b956e46..47bc0c5 100644
 --- a/ksrc/nucleus/synch.c
 +++ b/ksrc/nucleus/synch.c
 @@ -684,9 +684,13 @@ xnsynch_release_thread(struct xnsynch *synch

[Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-11 Thread Jan Kiszka
Hi all,

just looked into the hrescnt issue again, specifically the corner case
of a shadow thread switching from real-time policy to SCHED_OTHER.

Looks like we don't support this at all ATM:

do_setsched_event ignores events that enabled SCHED_OTHER, and the
XNOTHER flag is only set on xnshadow_map, not updated anywhere else.
Fixing this is not straightforward as that flag resides in a state
variable that is owned by the thread (ie. updated non-atomically) while
do_setsched_event can also be called over different contexts.

Or am I missing something?

In the light of this, I would vote for reverting f9bfab3457 (only
update rescnt for XNOTHER threads) as it only adds conditional branches
to the hot paths without solving the issue.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-11 Thread Jan Kiszka
On 2011-09-11 12:50, Jan Kiszka wrote:
 Hi all,
 
 just looked into the hrescnt issue again, specifically the corner case
 of a shadow thread switching from real-time policy to SCHED_OTHER.
 
 Looks like we don't support this at all ATM:
 
 do_setsched_event ignores events that enabled SCHED_OTHER, and the
 XNOTHER flag is only set on xnshadow_map, not updated anywhere else.
 Fixing this is not straightforward as that flag resides in a state
 variable that is owned by the thread (ie. updated non-atomically) while
 do_setsched_event can also be called over different contexts.

OK, it just takes nklock to update the thread state.

 
 Or am I missing something?

From __xnpod_set_thread_schedparam:

A non-real-time shadow may upgrade to real-time FIFO
scheduling, but the latter may never downgrade to
SCHED_NORMAL Xenomai-wise. In the valid case, we clear
XNOTHER to reflect the change.

What prevents us from setting XNOTHER if the priority drops to 0, i.e.
the Linux task re-enters SCHED_NORMAL (while Xenomai will still schedule
it via its RT class of course)? We cannot deny such a transition on the
Linux side anyway, can we?

Jan




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Policy switching and XNOTHER maintenance

2011-09-11 Thread Jan Kiszka
On 2011-09-11 16:24, Gilles Chanteperdrix wrote:
 On 09/11/2011 12:50 PM, Jan Kiszka wrote:
 Hi all,

 just looked into the hrescnt issue again, specifically the corner case
 of a shadow thread switching from real-time policy to SCHED_OTHER.
 
 Doing this while holding a mutex looks invalid.

Looking at POSIX e.g., is there anything in the spec that makes this
invalid? If the kernel preserves or established proper priority
boosting, I do not see what could break in principle.

It is nothing I would design into some app, but we should somehow handle
it (doc update or code adjustments).

 If we do not do it, the current code is valid.

Except for its dependency on XNOTHER which is not updated on RT-NORMAL
transitions.

Jan




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] xenomai-core ftrace

2011-09-04 Thread Jan Kiszka
On 2011-09-04 07:10, rainbow wrote:
 Sorry to reply so late, I did a test about install ftrace on xenomai. the
 following is my procedure:
 #git://git.xenomai.org/xenomai-jki.git queues/ftrace
 #git://git.kiszka.org/ipipe-2.6 queues/2.6.35-x86-trace
 #cd queues/ftrace
 #git checkout -b remotes/origin/queues/ftrace
  origin/queues/2.6.35-x86-trace  //change to the ftrace xenomai branch
 #cd ../2.6.35-x86-trace
 #git checkout
 -b origin/queues/2.6.35-x86-trace origin/queues/2.6.35-x86-trace
 #cd ../ftrace
 #./scripts/prepare-kernel.sh  --arch=i386
 --adeos=ksrc/arch/x86/patches/adeos-ipipe-2.6.35.9-x86-2.8-04.patch
 --linux=../2.6.35-x86-trace/
 #cd /2.6.35-x86-trace/
 
 then I compile the kernel but I get the following error message:
 arch/x86/kernel/ipipe.c:851: error: conflicting types for ‘update_vsyscall’
 include/linux/clocksource.h:316: note: previous declaration of
 ‘update_vsyscall’ was here
 make[2]: *** [arch/x86/kernel/ipipe.o] Error 1
 make[1]: *** [arch/x86/kernel] Error 2
 make: *** [arch/x86] Error 2

That's a build issues of the underlying old ipipe patch. However, it's
x86-32 only. And as the documentation stated, only x86-64 is supported
by the ftrace patches. So build for 64 bit instead.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] xenomai-core ftrace

2011-09-04 Thread Jan Kiszka
On 2011-09-04 13:49, rainbow wrote:
 you mean I use remotes/origin/queues/2.6.37-x86 branch and use the ipipe
 patch for 2.6.37 then install them on x86_64, the ftrace can work?I will
 have a try, thank you!

Use the 2.6.35-x86-trace, it already contains the ipipe patch, and build
it for x86-64.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] xenomai-core ftrace

2011-09-04 Thread Jan Kiszka
On 2011-09-04 14:21, rainbow wrote:
 Is the ipipe patch the same as patch like
 adeos-ipipe-2.6.37.6-x86-2.9-02.patch,

Except that the trace branch is for 2.6.35, yes. More precisely it is
now the same, I just pushed the latest version that includes two more
backported ipipe fixes.

 I know the latter is xenomai patch
 and after I patch it, I can see Real-time sub-system  ---  Option. But If
 I use 2.6.35-x86-trace which contains ,there is no such option.

That menu option is introduced by Xenomai, ie. after running
prepare-kernel.sh. You likely forgot that step.

Note again that you have to use a Xenomai tree with the required ftrace
patches on top if you want Xenomai to generate ftrace events as well.

 Another  problem is that there are so many xenomai gits , how can i download
 the correct git?

By cloning the the git repository you obtain all available branches. You
just need to checkout the desired one afterward.

 I am a newby to xenomai and I am sorry to ask so many questions but I want
 to do something on xenomai :) . Thank you for your detail answers.

Setting up ftrace for Xenomai is not necessarily a newbie task, but I
think I know the background of this. :)

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] xenomai-core ftrace

2011-09-04 Thread Jan Kiszka
On 2011-09-04 15:16, rainbow wrote:
 2011/9/4 Jan Kiszka jan.kis...@web.de
 
 On 2011-09-04 14:21, rainbow wrote:
 Is the ipipe patch the same as patch like
 adeos-ipipe-2.6.37.6-x86-2.9-02.patch,

 Except that the trace branch is for 2.6.35, yes. More precisely it is
 now the same, I just pushed the latest version that includes two more
 backported ipipe fixes.

 I know the latter is xenomai patch
 and after I patch it, I can see Real-time sub-system  ---  Option. But
 If
 I use 2.6.35-x86-trace which contains ,there is no such option.

 That menu option is introduced by Xenomai, ie. after running
 prepare-kernel.sh. You likely forgot that step.

 
 
 Yes,I forget the step. So I think I only have to run prepare-kernel.sh
 --arch=x86_64
 
 --linux=2.6.35-x86-trace  , I do not need --adeos option because
 the 2.6.35-x86-trace contains the ipipe patch.

 
 
 
 Note again that you have to use a Xenomai tree with the required ftrace
 patches on top if you want Xenomai to generate ftrace events as well.

 Xenomai tree with required ftrace patches on top you mean the branch
 remotes/origin/queues/ftrace?

Yep. I just pushed a rebased version of current git master.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] xenomai-core ftrace

2011-09-03 Thread Jan Kiszka
On 2011-09-03 04:52, rainbow wrote:
 hi,all,I want to use ftrace in xenomai-2.5.6,but when I use git://
 git.kiszka.org/ipipe.git queues/2.6.35-x86-trace to get the linux
 kernel,there is no option about xenomai or ipipe . If I want to patch the
 xenomai patch,there are some problem. How should I use ftrace on
 xenomai?Thanks!

First of all, make sure to read README.INSTALL in the Xenomai tree for
the basic installation procedure.

That git branch above replaces the installation step of picking a
vanilla Linux source tree and applying the ipipe patch to it (if there
is no ipipe option in the kernel config, you probably haven't check out
the right branch yet).

The next step would be running Xenomai's prepare-kernel.sh, in this case
using a Xenomai tree that has the required ftrace patches, see

http://permalink.gmane.org/gmane.linux.real-time.xenomai.devel/7966

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?

2011-08-26 Thread Jan Kiszka
On 2011-08-26 14:34, Gilles Chanteperdrix wrote:
 
 Hi,
 
 I think it is about time we release Xenomai 2.6.0. Has anyone anything
 pending (maybe Alex)? Should we release an -rc first?

No patches ATM, but [1] is still an open bug - a bug that affects the ABI.

Jan

[1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Xenomai 2.6.0, or -rc1?

2011-08-26 Thread Jan Kiszka
On 2011-08-26 20:07, Gilles Chanteperdrix wrote:
 On 08/26/2011 03:05 PM, Jan Kiszka wrote:
 On 2011-08-26 14:34, Gilles Chanteperdrix wrote:

 Hi,

 I think it is about time we release Xenomai 2.6.0. Has anyone anything
 pending (maybe Alex)? Should we release an -rc first?

 No patches ATM, but [1] is still an open bug - a bug that affects the ABI.

 Jan

 [1] http://thread.gmane.org/gmane.linux.real-time.xenomai.devel/8343

 
 I had forgotten about this one. So, the only real problem is if a
 SCHED_NOTOTHER thread switches to SCHED_OTHER, this appears to be a
 corner case, so, I wonder if you should not simply add a special
 treatment, only for this corner case.
 
 What I have in mind is keeping a list of xnsynch in kernel-space (this
 basically means having an xnholder_t more in the xnsynch structure), and
 when we trip the corner case (thread with SCHED_FIFO switches to
 SCHED_OTHER), walk the list to find how many xnsynch the thread is the
 owner, we have that info in kernel-space, and set the refcnt accordingly.
 
 Or does it still sound overkill?
 

Mmh, need to think about it. Yeah, we do not support
PTHREAD_MUTEX_INITIALIZER, so we do not share that part of the problem
with futexes.

If we have all objects and can explore ownership, we can also implement
robust mutexes this way, i.e. waiter signaling when the owner dies.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts

2011-07-31 Thread Jan Kiszka
On 2011-07-31 19:21, Gilles Chanteperdrix wrote:
 On 07/31/2011 06:49 PM, GIT version control wrote:
 +int rt_puts(const char *s)
 +{
 +return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL);
 +}
 
 gcc for ARM chokes here: it says that NULL can not be converted to a
 va_list, however I try it.

Hmm. Does this work?

diff --git a/src/skins/common/rt_print.c b/src/skins/common/rt_print.c
index 186de48..52538d8 100644
--- a/src/skins/common/rt_print.c
+++ b/src/skins/common/rt_print.c
@@ -243,7 +243,9 @@ int rt_printf(const char *format, ...)

 int rt_puts(const char *s)
 {
-   return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL);
+   va_list dummy;
+
+   return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, dummy);
 }

 void rt_syslog(int priority, const char *format, ...)


Not really beautiful as well, I know.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : rt_print: Provide rt_puts

2011-07-31 Thread Jan Kiszka
On 2011-07-31 19:46, Gilles Chanteperdrix wrote:
 On 07/31/2011 07:42 PM, Jan Kiszka wrote:
 On 2011-07-31 19:21, Gilles Chanteperdrix wrote:
 On 07/31/2011 06:49 PM, GIT version control wrote:
 +int rt_puts(const char *s)
 +{
 +  return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL);
 +}

 gcc for ARM chokes here: it says that NULL can not be converted to a
 va_list, however I try it.

 Hmm. Does this work?

 diff --git a/src/skins/common/rt_print.c b/src/skins/common/rt_print.c
 index 186de48..52538d8 100644
 --- a/src/skins/common/rt_print.c
 +++ b/src/skins/common/rt_print.c
 @@ -243,7 +243,9 @@ int rt_printf(const char *format, ...)

  int rt_puts(const char *s)
  {
 -return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, NULL);
 +va_list dummy;
 +
 +return print_to_buffer(stdout, 0, RT_PRINT_MODE_PUTS, s, dummy);
  }

  void rt_syslog(int priority, const char *format, ...)


 Not really beautiful as well, I know.
 
 It seems to work now, but some later version of gcc may decide to warn
 us that this argument is used without being initialized...

Yes. I've pushed a cleaner version.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __wrap_clock_gettime

2011-07-29 Thread Jan Kiszka
On 2011-07-28 18:56, Jan Kiszka wrote:
 On 2011-07-27 20:49, Gilles Chanteperdrix wrote:
 On 07/22/2011 05:04 PM, Jan Kiszka wrote:
 Hi Gilles,

 pulling assert_context.c into the common libxenomai created a problem
 around picking the right __wrap_clock_gettime. As libpthread_rt depends
 on libxenomai, the latter is loaded first and defines the debug version
 of __wrap_clock_gettime as the default. There is no chance to pull the
 posix skin implementation.

 I don't have a nice idea yes how to resolve this. Options are:
  - drop __wrap_clock_gettime from assert_context.c (affects any skin
user != posix)
  - put assert_context stuff into separate library again
  - put __wrap_clock_gettime in all skin libs != posix

 I'm favoring the simplest approach ATM, ie. the first one. Other ideas?

 I agree, but I would have thought __attriibute__((weak)) takes care of
 this issue.
 
 The point is that once you have pulled in that weak symbol into a
 process, the dynamic loader won't update it if a non-weak version is
 pulled in via dlopen.

Fiddling with all options again, it turned out that the issue is also
reproducible without any dlopen. This ordering is fine:

-lpthread_rt -lxenomai

This one does not work:

-lxenomai -lpthread_rt

Now the problematic ordering can be achieved in many ways, in our case
by linking a top-level process first against a library with
native+xenomai dependencies and then against one with pthread_rt+xenomai
deps. That is resolvable by reordering, but only if you maintain this
meta-information consistently (IOW, it's doomed to break accidentally
again).

So I've pushed a patch that replaces the code with a comment that
explains the non-existence.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [RFC] Fixes for domain migration races

2011-07-28 Thread Jan Kiszka
On 2011-07-27 20:44, Gilles Chanteperdrix wrote:
 On 07/19/2011 08:44 AM, Jan Kiszka wrote:
 Hi,

 I've just uploaded my upstream queue that mostly deals with the various
 races I found in the domain migration code.

 One of my concerns raised earlier turned out to be for no reason: We do
 not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
 deletion race can indeed be fixed by the patch I sent earlier.
 
 So, I still have the same question: is not the solution of synchronizing
 with the gatekeeper as soon as we get out from schedule in secondary
 mode better than waiting the task_exit callback? It looks more correct,
 and it avoids gksched.

Yes, I was on the wrong track /wrt wakeup races during the early
migration phase.

It is a possible and valid scenario that the task returns from
schedule() without being migrated. That can only happen if a signal was
queued in the meantime. The task will not be woken up again, that is
prevented by ATOMICSWITCH, but it will check for pending signals itself
before falling asleep. In that case it will enter TASK_RUNNING again and
return either before the gatekeeper could run or, on SMP, may continue
in parallel on a different CPU.

What saves us now from the fatal scenario that both the task runs and
the gatekeeper resumes its Xenomai part is that TASK_INTERRUPTIBLE state
was left. And if we wait for the gatekeeper to realize this like you
suggested, we ensure that neither the object is deleted too early nor
TASK_INTERRUPTIBLE is reentered again by doing Linux work.

I've cleaned up my queue correspondingly and just pushed it.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __wrap_clock_gettime

2011-07-28 Thread Jan Kiszka
On 2011-07-27 20:49, Gilles Chanteperdrix wrote:
 On 07/22/2011 05:04 PM, Jan Kiszka wrote:
 Hi Gilles,

 pulling assert_context.c into the common libxenomai created a problem
 around picking the right __wrap_clock_gettime. As libpthread_rt depends
 on libxenomai, the latter is loaded first and defines the debug version
 of __wrap_clock_gettime as the default. There is no chance to pull the
 posix skin implementation.

 I don't have a nice idea yes how to resolve this. Options are:
  - drop __wrap_clock_gettime from assert_context.c (affects any skin
user != posix)
  - put assert_context stuff into separate library again
  - put __wrap_clock_gettime in all skin libs != posix

 I'm favoring the simplest approach ATM, ie. the first one. Other ideas?
 
 I agree, but I would have thought __attriibute__((weak)) takes care of
 this issue.

The point is that once you have pulled in that weak symbol into a
process, the dynamic loader won't update it if a non-weak version is
pulled in via dlopen.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __wrap_clock_gettime

2011-07-28 Thread Jan Kiszka
On 2011-07-28 18:56, Jan Kiszka wrote:
 On 2011-07-27 20:49, Gilles Chanteperdrix wrote:
 On 07/22/2011 05:04 PM, Jan Kiszka wrote:
 Hi Gilles,

 pulling assert_context.c into the common libxenomai created a problem
 around picking the right __wrap_clock_gettime. As libpthread_rt depends
 on libxenomai, the latter is loaded first and defines the debug version
 of __wrap_clock_gettime as the default. There is no chance to pull the
 posix skin implementation.

 I don't have a nice idea yes how to resolve this. Options are:
  - drop __wrap_clock_gettime from assert_context.c (affects any skin
user != posix)
  - put assert_context stuff into separate library again
  - put __wrap_clock_gettime in all skin libs != posix

 I'm favoring the simplest approach ATM, ie. the first one. Other ideas?

 I agree, but I would have thought __attriibute__((weak)) takes care of
 this issue.
 
 The point is that once you have pulled in that weak symbol into a
 process, the dynamic loader won't update it if a non-weak version is
 pulled in via dlopen.

Mmh, maybe we have a fourth option: disabling the weak version if
CONFIG_XENO_LIBS_DLOPEN is set. OTOH, the correlation may not be obvious
to users. So that result could be even more confusing.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Debugging Xenomai kernel

2011-07-27 Thread Jan Kiszka
On 2011-07-26 13:54, zenati wrote:
 Dear,
 
 I'm trying to debug Xenomai kernel with GDB and Qemu.

QEMU in emulation or KVM mode?

 So, I launch
 Xenomai in Qemu and make it waiting for GDB. Then I start GDB and
 connect them to Qemu, make breakpoint at the desired function to debug
 and start Xenomai. If the function belongs to the linux kernel, the
 execution stop at the breakpoint. But, if the function belongs to the
 Xenomai nucleus, GDB recognize the breakpoint at the execution but don't
 stop.

Is Xenomai built as module or part of the main kernel image? The latter
is simpler to handle as you don't have to deal with modules as symbol
sources, find out their addresses, and load them at the right location.

 
 Is it possible to debug Xenomai with GDB and Qemu?

Definitely.

 Is it the good way to debug?
 
 Thank you for your attention and your help.
 Sincerely,
 
 Omar ZENATI

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Debugging Xenomai kernel

2011-07-27 Thread Jan Kiszka
On 2011-07-27 18:13, zenati wrote:
 On 07/27/2011 02:12 PM, Jan Kiszka wrote:
 On 2011-07-27 14:01, zenati wrote:
 On 07/27/2011 11:23 AM, Jan Kiszka wrote:
 On 2011-07-26 13:54, zenati wrote:
 Dear,

 I'm trying to debug Xenomai kernel with GDB and Qemu.
 QEMU in emulation or KVM mode?

 I'm using Qemu in emulation mode.
 What's your target arch? x84-64, x86-32, or some ARM or PPC etc.
 x86-32
 So, I launch
 Xenomai in Qemu and make it waiting for GDB. Then I start GDB and
 connect them to Qemu, make breakpoint at the desired function to debug
 and start Xenomai. If the function belongs to the linux kernel, the
 execution stop at the breakpoint. But, if the function belongs to the
 Xenomai nucleus, GDB recognize the breakpoint at the execution but don't
 stop.
 Is Xenomai built as module or part of the main kernel image? The latter
 is simpler to handle as you don't have to deal with modules as symbol
 sources, find out their addresses, and load them at the right location.

 Xenomai is built as a part of the main kernel.
 OK. Check during runtime of the guest if the disassembly at the desired
 breakpoint address matches the compiled code. Maybe there is some
 accidental discrepancy.
 The breakpoints match correctly the compiled code. Anyway, it must stop 
 at the breakpoint. No?

...if the guest actually executes that address.

I haven't used x86 QEMU in emulation mode for quite a while to debug
something. Maybe we have a regression there. What's your QEMU version?
Do you have a chance to test with a KVM host? It's faster anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [RFC] Fixes for domain migration races

2011-07-19 Thread Jan Kiszka
Hi,

I've just uploaded my upstream queue that mostly deals with the various
races I found in the domain migration code.

One of my concerns raised earlier turned out to be for no reason: We do
not allow Linux to wake up a task that has TASK_ATOMICSWITCH set. So the
deletion race can indeed be fixed by the patch I sent earlier. However,
we do not synchronize setting and testing of TASK_ATOMICSWITCH (because
we cannot hold the rq lock), thus we still face a small race window that
allows premature wakeups, at least in theory. That's now addressed by
patch 3.

Besides another race around set/clear_task_nowakeup, there should have
been a window during early migration to RT where we silently swallowed
Linux signals. Closed by patch 4, hopefully also fixing our spurious gdb
lockups on SMP boxes - time will tell.

Please review carefully.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [RFC] Waitqueue-free gatekeeper wakeup

2011-07-18 Thread Jan Kiszka
Hi Philippe,

trying to decouple the PREEMPT-RT gatekeeper wakeup path from XNATOMIC
(to fix the remaining races there), I wondered why we need a waitqueue
here at all.

What about an approach like below, i.e. waking up the gatekeeper
directly via wake_up_process? That could even be called from interrupt
context. We should be able to avoid missing a wakeup by setting the task
state to INTERRUPTIBLE before signaling the semaphore.

Am I missing something?

Jan


diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index e251329..df8853b 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -111,7 +111,6 @@ typedef struct xnsched {
 
 #ifdef CONFIG_XENO_OPT_PERVASIVE
struct task_struct *gatekeeper;
-   wait_queue_head_t gkwaitq;
struct semaphore gksync;
struct xnthread *gktarget;
 #endif
diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index f6b1e16..238317a 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -92,7 +92,6 @@ static struct __lostagerq {
 #define LO_SIGGRP_REQ 2
 #define LO_SIGTHR_REQ 3
 #define LO_UNMAP_REQ  4
-#define LO_GKWAKE_REQ 5
int type;
struct task_struct *task;
int arg;
@@ -759,9 +758,6 @@ static void lostage_handler(void *cookie)
int cpu, reqnum, type, arg, sig, sigarg;
struct __lostagerq *rq;
struct task_struct *p;
-#ifdef CONFIG_PREEMPT_RT
-   struct xnsched *sched;
-#endif
 
cpu = smp_processor_id();
rq = lostagerq[cpu];
@@ -819,13 +815,6 @@ static void lostage_handler(void *cookie)
case LO_SIGGRP_REQ:
kill_proc(p-pid, arg, 1);
break;
-
-#ifdef CONFIG_PREEMPT_RT
-   case LO_GKWAKE_REQ:
-   sched = xnpod_sched_slot(cpu);
-   wake_up_interruptible_sync(sched-gkwaitq);
-   break;
-#endif
}
}
 }
@@ -873,7 +862,6 @@ static inline int normalize_priority(int prio)
 static int gatekeeper_thread(void *data)
 {
struct task_struct *this_task = current;
-   DECLARE_WAITQUEUE(wait, this_task);
int cpu = (long)data;
struct xnsched *sched = xnpod_sched_slot(cpu);
struct xnthread *target;
@@ -886,12 +874,10 @@ static int gatekeeper_thread(void *data)
set_cpus_allowed(this_task, cpumask);
set_linux_task_priority(this_task, MAX_RT_PRIO - 1);
 
-   init_waitqueue_head(sched-gkwaitq);
-   add_wait_queue_exclusive(sched-gkwaitq, wait);
+   set_current_state(TASK_INTERRUPTIBLE);
up(sched-gksync); /* Sync with xnshadow_mount(). */
 
for (;;) {
-   set_current_state(TASK_INTERRUPTIBLE);
up(sched-gksync); /* Make the request token available. */
schedule();
 
@@ -937,6 +923,7 @@ static int gatekeeper_thread(void *data)
xnlock_put_irqrestore(nklock, s);
xnpod_schedule();
}
+   set_current_state(TASK_INTERRUPTIBLE);
}
 
return 0;
@@ -1014,23 +1001,9 @@ redo:
thread-gksched = sched;
xnthread_set_info(thread, XNATOMIC);
set_current_state(TASK_INTERRUPTIBLE | TASK_ATOMICSWITCH);
-#ifndef CONFIG_PREEMPT_RT
-   /*
-* We may not hold the preemption lock across calls to
-* wake_up_*() services over fully preemptible kernels, since
-* tasks might sleep when contending for spinlocks. The wake
-* up call for the gatekeeper will happen later, over an APC
-* we kick in do_schedule_event() on the way out for the
-* hardening task.
-*
-* We could delay the wake up call over non-RT 2.6 kernels as
-* well, but not when running over 2.4 (scheduler innards
-* would not allow this, causing weirdnesses when hardening
-* tasks). So we always do the early wake up when running
-* non-RT, which includes 2.4.
-*/
-   wake_up_interruptible_sync(sched-gkwaitq);
-#endif
+
+   wake_up_process(sched-gatekeeper);
+
schedule();
 
/*

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-16 Thread Jan Kiszka
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2011-07-15 15:10, Jan Kiszka wrote:
 But... right now it looks like we found our primary regression: 
 nucleus/shadow: shorten the uninterruptible path to secondary mode.
 It opens a short windows during relax where the migrated task may be
 active under both schedulers. We are currently evaluating a revert
 (looks good so far), and I need to work out my theory in more
 details.

Looks like this commit just made a long-standing flaw in Xenomai's
interrupt handling more visible: We reschedule over the interrupt stack
in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
other archs have interrupt stacks, the point is Xenomai's design wrongly
assumes there are no such things. We were lucky so far that the values
saved on this shared stack were apparently compatible, means we were
overwriting them with identical or harmless values. But that's no longer
true when interrupts are hitting us in the xnpod_suspend_thread path of
a relaxing shadow.

Likely the only possible fix is establishing a reschedule hook for
Xenomai in the interrupt exit path after the original stack is restored
- - just like Linux works. Requires changes to both ipipe and Xenomai
unfortunately.

Jan

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4hSDsACgkQitSsb3rl5xSmOACfbZfcNKyO9YDvPE+R5H75d0ky
DX0An32BrZW+lpEnxnLLCHSQ5r8itnE9
=n6u8
-END PGP SIGNATURE-

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-16 Thread Jan Kiszka

On 2011-07-16 10:52, Philippe Gerum wrote:

On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:

On 2011-07-15 15:10, Jan Kiszka wrote:

But... right now it looks like we found our primary regression:
nucleus/shadow: shorten the uninterruptible path to secondary mode.
It opens a short windows during relax where the migrated task may be
active under both schedulers. We are currently evaluating a revert
(looks good so far), and I need to work out my theory in more
details.


Looks like this commit just made a long-standing flaw in Xenomai's
interrupt handling more visible: We reschedule over the interrupt stack
in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
other archs have interrupt stacks, the point is Xenomai's design wrongly
assumes there are no such things.


Fortunately, no, this is not a design issue, no such assumption was ever
made, but the Xenomai core expects this to be handled on a per-arch
basis with the interrupt pipeline.


And that's already the problem: If Linux uses interrupt stacks, relying 
on ipipe to disable this during Xenomai interrupt handler execution is 
at best a workaround. A fragile one unless you increase the pre-thread 
stack size by the size of the interrupt stack. Lacking support for a 
generic rescheduling hook became a problem by the time Linux introduced 
interrupt threads.



As you pointed out, there is no way
to handle this via some generic Xenomai-only support.

ppc64 now has separate interrupt stacks, which is why I disabled
IRQSTACKS which became the builtin default at some point. Blackfin goes
through a Xenomai-defined irq tail handler as well, because it may not
reschedule over nested interrupt stacks.


How does this arch prevent that xnpod_schedule in the generic interrupt 
handler tail does its normal work?



Fact is that such pending
problem with x86_64 was overlooked since day #1 by /me.


  We were lucky so far that the values
saved on this shared stack were apparently compatible, means we were
overwriting them with identical or harmless values. But that's no longer
true when interrupts are hitting us in the xnpod_suspend_thread path of
a relaxing shadow.



Makes sense. It would be better to find a solution that does not make
the relax path uninterruptible again for a significant amount of time.
On low end platforms we support (i.e. non-x86* mainly), this causes
obvious latency spots.


I agree. Conceptually, the interruptible relaxation should be safe now 
after recent fixes.





Likely the only possible fix is establishing a reschedule hook for
Xenomai in the interrupt exit path after the original stack is restored
- - just like Linux works. Requires changes to both ipipe and Xenomai
unfortunately.


__ipipe_run_irqtail() is in the I-pipe core for such purpose. If
instantiated properly for x86_64, and paired with xnarch_escalate() for
that arch as well, it could be an option for running the rescheduling
procedure when safe.


Nope, that doesn't work. The stack is switched later in the return path 
in entry_64.S. We need a hook there, ideally a conditional one, 
controlled by some per-cpu variable that is set by Xenomai on return 
from its interrupt handlers to signal the rescheduling need.


Jan

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-16 Thread Jan Kiszka
On 2011-07-16 11:56, Philippe Gerum wrote:
 On Sat, 2011-07-16 at 11:15 +0200, Jan Kiszka wrote:
 On 2011-07-16 10:52, Philippe Gerum wrote:
 On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:
 On 2011-07-15 15:10, Jan Kiszka wrote:
 But... right now it looks like we found our primary regression:
 nucleus/shadow: shorten the uninterruptible path to secondary mode.
 It opens a short windows during relax where the migrated task may be
 active under both schedulers. We are currently evaluating a revert
 (looks good so far), and I need to work out my theory in more
 details.

 Looks like this commit just made a long-standing flaw in Xenomai's
 interrupt handling more visible: We reschedule over the interrupt stack
 in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
 other archs have interrupt stacks, the point is Xenomai's design wrongly
 assumes there are no such things.

 Fortunately, no, this is not a design issue, no such assumption was ever
 made, but the Xenomai core expects this to be handled on a per-arch
 basis with the interrupt pipeline.

 And that's already the problem: If Linux uses interrupt stacks, relying 
 on ipipe to disable this during Xenomai interrupt handler execution is 
 at best a workaround. A fragile one unless you increase the pre-thread 
 stack size by the size of the interrupt stack. Lacking support for a 
 generic rescheduling hook became a problem by the time Linux introduced 
 interrupt threads.
 
 Don't assume too much. What was done for ppc64 was not meant as a
 general policy. Again, this is a per-arch decision.

Actually, it was the right decision, not only for ppc64: Reusing Linux
interrupt stacks for Xenomai does not work. If we interrupt Linux while
it is already running over the interrupt stack, the stack becomes taboo
on that CPU. From that point on, no RT IRQ must run over the Linux
interrupt stack as it would smash it.

But then the question is why we should try to use the interrupt stacks
for Xenomai at all. It's better to increase the task kernel stacks and
disable interrupt stacks when ipipe is enabled. That's what I'm heading
for with x86-64 now (THREAD_ORDER 2, no stack switching).

What we may do is introducing per-domain interrupt stacks. But that's at
best Xenomai 3 / I-pipe 3 stuff.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-15 Thread Jan Kiszka
On 2011-07-15 14:30, Gilles Chanteperdrix wrote:
 On 07/14/2011 10:57 PM, Jan Kiszka wrote:
 On 2011-07-13 21:12, Gilles Chanteperdrix wrote:
 On 07/13/2011 09:04 PM, Jan Kiszka wrote:
 On 2011-07-13 20:39, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:43 PM, Jan Kiszka wrote:
 On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:34 PM, Jan Kiszka wrote:
 On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
 On 07/12/2011 02:57 PM, Jan Kiszka wrote:
  xnlock_put_irqrestore(nklock, s);
  xnpod_schedule();
  }
 @@ -1036,6 +1043,7 @@ redo:
   * to process this signal anyway.
   */
  if (rthal_current_domain == rthal_root_domain) {
 +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, 
 XNATOMIC));

 Misleading dead code again, XNATOMIC is cleared not ten lines above.

 Nope, I forgot to remove that line.


  if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
  || this_task-state != TASK_RUNNING))
  xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
  return -ERESTARTSYS;
  }
  
 +xnthread_clear_info(thread, XNATOMIC);

 Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
 place at the point it currently is.

 Nope. Now we either clear XNATOMIC after successful migration or when
 the signal is about to be sent (ie. in the hook). That way we can test
 more reliably (TM) in the gatekeeper if the thread can be migrated.

 Ok for adding the XNATOMIC test, because it improves the robustness, but
 why changing the way XNATOMIC is set and clear? Chances of breaking
 thing while changing code in this area are really high...

 The current code is (most probably) broken as it does not properly
 synchronizes the gatekeeper against a signaled and runaway target
 Linux task.

 We need an indication if a Linux signal will (or already has) woken up
 the to-be-migrated task. That task may have continued over its context,
 potentially on a different CPU. Providing this indication is the purpose
 of changing where XNATOMIC is cleared.

 What about synchronizing with the gatekeeper with a semaphore, as done
 in the first patch you sent, but doing it in xnshadow_harden, as soon as
 we detect that we are not back from schedule in primary mode? It seems
 it would avoid any further issue, as we would then be guaranteed that
 the thread could not switch to TASK_INTERRUPTIBLE again before the
 gatekeeper is finished.

 The problem is that the gatekeeper tests the task state without holding
 the task's rq lock (which is not available to us without a kernel
 patch). That cannot work reliably as long as we accept signals. That's
 why I'm trying to move state change and test under nklock.


 What worries me is the comment in xnshadow_harden:

* gatekeeper sent us to primary mode. Since
* TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
* the runqueue's count of uniniterruptible tasks, we just
* notice the issue and gracefully fail; the caller will have
* to process this signal anyway.
*/

 Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
 point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
 business of xnshadow_harden?


 TASK_UNINTERRUPTIBLE is not available without patching the kernel's
 scheduler for the reason mentioned in the comment (the scheduler becomes
 confused and may pick the wrong tasks, IIRC).

 Does not using down/up in the taskexit event handler risk to cause the
 same issue?

 Yes, and that means the first patch is incomplete without something like
 the second.



 But I would refrain from trying to improve the gatekeeper design. I've
 recently mentioned this to Philippe offlist: For Xenomai 3 with some
 ipipe v3, we must rather patch schedule() to enable zero-switch domain
 migration. Means: enter the scheduler, let it suspend current and pick
 another task, but then simply escalate to the RT domain before doing any
 context switch. That's much cheaper than the current design and
 hopefully also less error-prone.

 So, do you want me to merge your for-upstream branch?

 You may merge up to for-upstream^, ie. without any gatekeeper fixes.

 I strongly suspect that there are still more races in the migration
 path. The crashes we face even with all patches applied may be related
 to a shadow task being executed under Linux and Xenomai at the same time.
 
 Maybe we could try the following patch instead?
 
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..deb7620 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -1033,6 +1033,8 @@ redo:
   xnpod_fatal
   (xnshadow_harden() failed for thread %s[%d],
thread-name, xnthread_user_pid(thread));
 + down(sched-gksync);
 + up(sched-gksync);
   return -ERESTARTSYS;
   }
 

I don't think we need

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-14 Thread Jan Kiszka
On 2011-07-13 21:12, Gilles Chanteperdrix wrote:
 On 07/13/2011 09:04 PM, Jan Kiszka wrote:
 On 2011-07-13 20:39, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:43 PM, Jan Kiszka wrote:
 On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:34 PM, Jan Kiszka wrote:
 On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
 On 07/12/2011 02:57 PM, Jan Kiszka wrote:
xnlock_put_irqrestore(nklock, s);
xnpod_schedule();
}
 @@ -1036,6 +1043,7 @@ redo:
 * to process this signal anyway.
 */
if (rthal_current_domain == rthal_root_domain) {
 +  XENO_BUGON(NUCLEUS, xnthread_test_info(thread, 
 XNATOMIC));

 Misleading dead code again, XNATOMIC is cleared not ten lines above.

 Nope, I forgot to remove that line.


if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
|| this_task-state != TASK_RUNNING))
xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
return -ERESTARTSYS;
}
  
 +  xnthread_clear_info(thread, XNATOMIC);

 Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
 place at the point it currently is.

 Nope. Now we either clear XNATOMIC after successful migration or when
 the signal is about to be sent (ie. in the hook). That way we can test
 more reliably (TM) in the gatekeeper if the thread can be migrated.

 Ok for adding the XNATOMIC test, because it improves the robustness, but
 why changing the way XNATOMIC is set and clear? Chances of breaking
 thing while changing code in this area are really high...

 The current code is (most probably) broken as it does not properly
 synchronizes the gatekeeper against a signaled and runaway target
 Linux task.

 We need an indication if a Linux signal will (or already has) woken up
 the to-be-migrated task. That task may have continued over its context,
 potentially on a different CPU. Providing this indication is the purpose
 of changing where XNATOMIC is cleared.

 What about synchronizing with the gatekeeper with a semaphore, as done
 in the first patch you sent, but doing it in xnshadow_harden, as soon as
 we detect that we are not back from schedule in primary mode? It seems
 it would avoid any further issue, as we would then be guaranteed that
 the thread could not switch to TASK_INTERRUPTIBLE again before the
 gatekeeper is finished.

 The problem is that the gatekeeper tests the task state without holding
 the task's rq lock (which is not available to us without a kernel
 patch). That cannot work reliably as long as we accept signals. That's
 why I'm trying to move state change and test under nklock.


 What worries me is the comment in xnshadow_harden:

  * gatekeeper sent us to primary mode. Since
  * TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
  * the runqueue's count of uniniterruptible tasks, we just
  * notice the issue and gracefully fail; the caller will have
  * to process this signal anyway.
  */

 Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
 point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
 business of xnshadow_harden?


 TASK_UNINTERRUPTIBLE is not available without patching the kernel's
 scheduler for the reason mentioned in the comment (the scheduler becomes
 confused and may pick the wrong tasks, IIRC).
 
 Does not using down/up in the taskexit event handler risk to cause the
 same issue?

Yes, and that means the first patch is incomplete without something like
the second.

 

 But I would refrain from trying to improve the gatekeeper design. I've
 recently mentioned this to Philippe offlist: For Xenomai 3 with some
 ipipe v3, we must rather patch schedule() to enable zero-switch domain
 migration. Means: enter the scheduler, let it suspend current and pick
 another task, but then simply escalate to the RT domain before doing any
 context switch. That's much cheaper than the current design and
 hopefully also less error-prone.
 
 So, do you want me to merge your for-upstream branch?

You may merge up to for-upstream^, ie. without any gatekeeper fixes.

I strongly suspect that there are still more races in the migration
path. The crashes we face even with all patches applied may be related
to a shadow task being executed under Linux and Xenomai at the same time.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-13 Thread Jan Kiszka
On 2011-07-13 20:39, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:43 PM, Jan Kiszka wrote:
 On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:34 PM, Jan Kiszka wrote:
 On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
 On 07/12/2011 02:57 PM, Jan Kiszka wrote:
  xnlock_put_irqrestore(nklock, s);
  xnpod_schedule();
  }
 @@ -1036,6 +1043,7 @@ redo:
   * to process this signal anyway.
   */
  if (rthal_current_domain == rthal_root_domain) {
 +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, 
 XNATOMIC));

 Misleading dead code again, XNATOMIC is cleared not ten lines above.

 Nope, I forgot to remove that line.


  if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
  || this_task-state != TASK_RUNNING))
  xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
  return -ERESTARTSYS;
  }
  
 +xnthread_clear_info(thread, XNATOMIC);

 Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
 place at the point it currently is.

 Nope. Now we either clear XNATOMIC after successful migration or when
 the signal is about to be sent (ie. in the hook). That way we can test
 more reliably (TM) in the gatekeeper if the thread can be migrated.

 Ok for adding the XNATOMIC test, because it improves the robustness, but
 why changing the way XNATOMIC is set and clear? Chances of breaking
 thing while changing code in this area are really high...

 The current code is (most probably) broken as it does not properly
 synchronizes the gatekeeper against a signaled and runaway target
 Linux task.

 We need an indication if a Linux signal will (or already has) woken up
 the to-be-migrated task. That task may have continued over its context,
 potentially on a different CPU. Providing this indication is the purpose
 of changing where XNATOMIC is cleared.
 
 What about synchronizing with the gatekeeper with a semaphore, as done
 in the first patch you sent, but doing it in xnshadow_harden, as soon as
 we detect that we are not back from schedule in primary mode? It seems
 it would avoid any further issue, as we would then be guaranteed that
 the thread could not switch to TASK_INTERRUPTIBLE again before the
 gatekeeper is finished.

The problem is that the gatekeeper tests the task state without holding
the task's rq lock (which is not available to us without a kernel
patch). That cannot work reliably as long as we accept signals. That's
why I'm trying to move state change and test under nklock.

 
 What worries me is the comment in xnshadow_harden:
 
* gatekeeper sent us to primary mode. Since
* TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
* the runqueue's count of uniniterruptible tasks, we just
* notice the issue and gracefully fail; the caller will have
* to process this signal anyway.
*/
 
 Does this mean that we can not switch to TASK_UNINTERRUPTIBLE at this
 point? Or simply that TASK_UNINTERRUPTIBLE is not available for the
 business of xnshadow_harden?
 

TASK_UNINTERRUPTIBLE is not available without patching the kernel's
scheduler for the reason mentioned in the comment (the scheduler becomes
confused and may pick the wrong tasks, IIRC).

But I would refrain from trying to improve the gatekeeper design. I've
recently mentioned this to Philippe offlist: For Xenomai 3 with some
ipipe v3, we must rather patch schedule() to enable zero-switch domain
migration. Means: enter the scheduler, let it suspend current and pick
another task, but then simply escalate to the RT domain before doing any
context switch. That's much cheaper than the current design and
hopefully also less error-prone.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:12 PM, Jan Kiszka wrote:
 On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
 magic = xnthread_get_magic(thread);
  
 xnlock_get_irqsave(nklock, s);
 +
 +   gksched = thread-gksched;
 +   if (gksched) {
 +   xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is 
 not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also 
 preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We request 
 it,
 thus we wait for the gatekeeper to become idle again. While it is 
 idle,
 we reset the queued reference - but I just realized that this may 
 tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for the
 harden to be complete, so setting to NULL is useless. Or am I missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on the
 gatekeeper never waking up spuriously and then dereferencing a stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the completion
 of the non-RT part of the hardening. Actually, this part usually fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, because at
 this point the cleanup code will still be waiting for the semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
 code anyway (hopefully obsolete one day).

 The gatekeeper is not woken up by posting the semaphore, the gatekeeper
 is woken up by the thread which is going to be hardened (and this thread
 is the one which waits for the semaphore).

 All true. And what is the point?
 
 The point being, would not something like this patch be sufficient?
 
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..4742c02 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
 task_struct *p)
   magic = xnthread_get_magic(thread);
 
   xnlock_get_irqsave(nklock, s);
 + if (xnthread_test_info(thread, XNATOMIC)) {
 + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));

That's not reliable, the task might have been migrated by Linux in the
meantime. We must use the stored gksched.

 + xnlock_put_irqrestore(nklock, s);
 +
 + /* Thread is in flight to primary mode, wait for the
 +gatekeeper to be done with it. */
 + down(gksched-gksync);
 + up(gksched-gksync);
 +
 + xnlock_get_irqsave(nklock, s);
 + }
 +
   /* Prevent wakeup call from xnshadow_unmap(). */
   xnshadow_thrptd(p) = NULL;
   xnthread_archtcb(thread)-user_task = NULL;
 

Again, setting gktarget to NULL and testing for NULL is simply safer,
and I see no gain in skipping that. But if you prefer the
micro-optimization, I'll drop it.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
 On 07/12/2011 09:22 AM, Jan Kiszka wrote:
 On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:12 PM, Jan Kiszka wrote:
 On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void 
 do_taskexit_event(struct task_struct *p)
   magic = xnthread_get_magic(thread);
  
   xnlock_get_irqsave(nklock, s);
 +
 + gksched = thread-gksched;
 + if (gksched) {
 + xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is 
 not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also 
 preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed 
 to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We request 
 it,
 thus we wait for the gatekeeper to become idle again. While it is 
 idle,
 we reset the queued reference - but I just realized that this may 
 tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only 
 needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for the
 harden to be complete, so setting to NULL is useless. Or am I missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on the
 gatekeeper never waking up spuriously and then dereferencing a stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the completion
 of the non-RT part of the hardening. Actually, this part usually fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, because at
 this point the cleanup code will still be waiting for the semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
 code anyway (hopefully obsolete one day).

 The gatekeeper is not woken up by posting the semaphore, the gatekeeper
 is woken up by the thread which is going to be hardened (and this thread
 is the one which waits for the semaphore).

 All true. And what is the point?

 The point being, would not something like this patch be sufficient?

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..4742c02 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
 task_struct *p)
 magic = xnthread_get_magic(thread);

 xnlock_get_irqsave(nklock, s);
 +   if (xnthread_test_info(thread, XNATOMIC)) {
 +   struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));

 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 +   xnlock_put_irqrestore(nklock, s);
 +
 +   /* Thread is in flight to primary mode, wait for the
 +  gatekeeper to be done with it. */
 +   down(gksched-gksync);
 +   up(gksched-gksync);
 +
 +   xnlock_get_irqsave(nklock, s);
 +   }
 +
 /* Prevent wakeup call from xnshadow_unmap(). */
 xnshadow_thrptd(p) = NULL;
 xnthread_archtcb(thread)-user_task = NULL;


 Again, setting gktarget to NULL and testing for NULL is simply safer,
 and I see no gain in skipping that. But if you prefer the
 micro-optimization, I'll drop it.
 
 Could not we use an info bit instead of adding a pointer?
 

That's not reliable, the task might have been migrated by Linux in the
meantime. We must use the stored gksched.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 13:04, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:00 PM, Jan Kiszka wrote:
 On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
 On 07/12/2011 09:22 AM, Jan Kiszka wrote:
 On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:12 PM, Jan Kiszka wrote:
 On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void 
 do_taskexit_event(struct task_struct *p)
 magic = xnthread_get_magic(thread);
  
 xnlock_get_irqsave(nklock, s);
 +
 +   gksched = thread-gksched;
 +   if (gksched) {
 +   xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed 
 is not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also 
 preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper 
 garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We 
 request it,
 thus we wait for the gatekeeper to become idle again. While it is 
 idle,
 we reset the queued reference - but I just realized that this may 
 tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only 
 needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I 
 understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for 
 the
 harden to be complete, so setting to NULL is useless. Or am I 
 missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on 
 the
 gatekeeper never waking up spuriously and then dereferencing a stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the 
 completion
 of the non-RT part of the hardening. Actually, this part usually 
 fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, because 
 at
 this point the cleanup code will still be waiting for the semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
 code anyway (hopefully obsolete one day).

 The gatekeeper is not woken up by posting the semaphore, the gatekeeper
 is woken up by the thread which is going to be hardened (and this thread
 is the one which waits for the semaphore).

 All true. And what is the point?

 The point being, would not something like this patch be sufficient?

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..4742c02 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
 task_struct *p)
   magic = xnthread_get_magic(thread);

   xnlock_get_irqsave(nklock, s);
 + if (xnthread_test_info(thread, XNATOMIC)) {
 + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));

 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 + xnlock_put_irqrestore(nklock, s);
 +
 + /* Thread is in flight to primary mode, wait for the
 +gatekeeper to be done with it. */
 + down(gksched-gksync);
 + up(gksched-gksync);
 +
 + xnlock_get_irqsave(nklock, s);
 + }
 +
   /* Prevent wakeup call from xnshadow_unmap(). */
   xnshadow_thrptd(p) = NULL;
   xnthread_archtcb(thread)-user_task = NULL;


 Again, setting gktarget to NULL and testing for NULL is simply safer,
 and I see no gain in skipping that. But if you prefer the
 micro-optimization, I'll drop it.

 Could not we use an info bit instead of adding a pointer?


 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.
 
 I mean add another info bit to mean that the task is queued for wakeup
 by the gatekeeper.
 
 XNGKQ, or something.

What additional value does it provide to gksched != NULL? We need that
pointer anyway to identify the gatekeeper that holds a reference.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 13:08, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:06 PM, Jan Kiszka wrote:
 On 2011-07-12 13:04, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:00 PM, Jan Kiszka wrote:
 On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
 On 07/12/2011 09:22 AM, Jan Kiszka wrote:
 On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:12 PM, Jan Kiszka wrote:
 On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void 
 do_taskexit_event(struct task_struct *p)
   magic = xnthread_get_magic(thread);
  
   xnlock_get_irqsave(nklock, s);
 +
 + gksched = thread-gksched;
 + if (gksched) {
 + xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed 
 is not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also 
 preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper 
 garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We 
 request it,
 thus we wait for the gatekeeper to become idle again. While it 
 is idle,
 we reset the queued reference - but I just realized that this 
 may tramp
 on other tasks' values. I need to add a check that the value to 
 be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only 
 needed
 while gksync is zero - but then we won't get hold of it anyway 
 and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I 
 understand,
 what you want is to set gktarget to null if a task being hardened 
 is
 destroyed. But by waiting for the semaphore, you actually wait 
 for the
 harden to be complete, so setting to NULL is useless. Or am I 
 missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on 
 the
 gatekeeper never waking up spuriously and then dereferencing a 
 stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the 
 completion
 of the non-RT part of the hardening. Actually, this part usually 
 fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, 
 because at
 this point the cleanup code will still be waiting for the semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is 
 hairy
 code anyway (hopefully obsolete one day).

 The gatekeeper is not woken up by posting the semaphore, the 
 gatekeeper
 is woken up by the thread which is going to be hardened (and this 
 thread
 is the one which waits for the semaphore).

 All true. And what is the point?

 The point being, would not something like this patch be sufficient?

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..4742c02 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
 task_struct *p)
 magic = xnthread_get_magic(thread);

 xnlock_get_irqsave(nklock, s);
 +   if (xnthread_test_info(thread, XNATOMIC)) {
 +   struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));

 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 +   xnlock_put_irqrestore(nklock, s);
 +
 +   /* Thread is in flight to primary mode, wait for the
 +  gatekeeper to be done with it. */
 +   down(gksched-gksync);
 +   up(gksched-gksync);
 +
 +   xnlock_get_irqsave(nklock, s);
 +   }
 +
 /* Prevent wakeup call from xnshadow_unmap(). */
 xnshadow_thrptd(p) = NULL;
 xnthread_archtcb(thread)-user_task = NULL;


 Again, setting gktarget to NULL and testing for NULL is simply safer,
 and I see no gain in skipping that. But if you prefer the
 micro-optimization, I'll drop it.

 Could not we use an info bit instead of adding a pointer?


 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 I mean add another info bit to mean that the task is queued for wakeup
 by the gatekeeper.

 XNGKQ, or something.

 What additional value does it provide to gksched != NULL? We need that
 pointer anyway to identify the gatekeeper that holds a reference.
 
 No, the scheduler which holds the reference

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 13:26, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:10 PM, Jan Kiszka wrote:
 On 2011-07-12 13:08, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:06 PM, Jan Kiszka wrote:
 On 2011-07-12 13:04, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:00 PM, Jan Kiszka wrote:
 On 2011-07-12 12:59, Gilles Chanteperdrix wrote:
 On 07/12/2011 09:22 AM, Jan Kiszka wrote:
 On 2011-07-12 08:41, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:12 PM, Jan Kiszka wrote:
 On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void 
 do_taskexit_event(struct task_struct *p)
 magic = xnthread_get_magic(thread);
  
 xnlock_get_irqsave(nklock, s);
 +
 +   gksched = thread-gksched;
 +   if (gksched) {
 +   xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is 
 needed is not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also 
 preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper 
 garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We 
 request it,
 thus we wait for the gatekeeper to become idle again. While 
 it is idle,
 we reset the queued reference - but I just realized that this 
 may tramp
 on other tasks' values. I need to add a check that the value 
 to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is 
 only needed
 while gksync is zero - but then we won't get hold of it anyway 
 and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I 
 understand,
 what you want is to set gktarget to null if a task being 
 hardened is
 destroyed. But by waiting for the semaphore, you actually wait 
 for the
 harden to be complete, so setting to NULL is useless. Or am I 
 missing
 something else?

 Setting to NULL is probably unneeded but still better than rely 
 on the
 gatekeeper never waking up spuriously and then dereferencing a 
 stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the 
 completion
 of the non-RT part of the hardening. Actually, this part usually 
 fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the 
 test
 if (target ...) in the gatekeeper code will always be true, 
 because at
 this point the cleanup code will still be waiting for the 
 semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is 
 hairy
 code anyway (hopefully obsolete one day).

 The gatekeeper is not woken up by posting the semaphore, the 
 gatekeeper
 is woken up by the thread which is going to be hardened (and this 
 thread
 is the one which waits for the semaphore).

 All true. And what is the point?

 The point being, would not something like this patch be sufficient?

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 01f4200..4742c02 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -2527,6 +2527,18 @@ static inline void do_taskexit_event(struct
 task_struct *p)
   magic = xnthread_get_magic(thread);

   xnlock_get_irqsave(nklock, s);
 + if (xnthread_test_info(thread, XNATOMIC)) {
 + struct xnsched *gksched = xnpod_sched_slot(task_cpu(p));

 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 + xnlock_put_irqrestore(nklock, s);
 +
 + /* Thread is in flight to primary mode, wait for the
 +gatekeeper to be done with it. */
 + down(gksched-gksync);
 + up(gksched-gksync);
 +
 + xnlock_get_irqsave(nklock, s);
 + }
 +
   /* Prevent wakeup call from xnshadow_unmap(). */
   xnshadow_thrptd(p) = NULL;
   xnthread_archtcb(thread)-user_task = NULL;


 Again, setting gktarget to NULL and testing for NULL is simply safer,
 and I see no gain in skipping that. But if you prefer the
 micro-optimization, I'll drop it.

 Could not we use an info bit instead of adding a pointer?


 That's not reliable, the task might have been migrated by Linux in the
 meantime. We must use the stored gksched.

 I mean add another info bit to mean that the task is queued for wakeup
 by the gatekeeper.

 XNGKQ, or something.

 What additional value does it provide to gksched != NULL? We need that
 pointer anyway to identify the gatekeeper that holds a reference

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 13:56, Jan Kiszka wrote:
 However, this parallel unsynchronized execution of the gatekeeper and
 its target thread leaves an increasingly bad feeling on my side. Did we
 really catch all corner cases now? I wouldn't guarantee that yet.
 Specifically as I still have an obscure crash of a Xenomai thread on
 Linux schedule() on my table.
 
 What if the target thread woke up due to a signal, continued much
 further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the
 gatekeeper continued? I wish we could already eliminate this complexity
 and do the migration directly inside schedule()...

BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in
the gatekeeper? What would happen if we included it (state ==
(TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 14:06, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:58 PM, Jan Kiszka wrote:
 On 2011-07-12 13:56, Jan Kiszka wrote:
 However, this parallel unsynchronized execution of the gatekeeper and
 its target thread leaves an increasingly bad feeling on my side. Did we
 really catch all corner cases now? I wouldn't guarantee that yet.
 Specifically as I still have an obscure crash of a Xenomai thread on
 Linux schedule() on my table.

 What if the target thread woke up due to a signal, continued much
 further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the
 gatekeeper continued? I wish we could already eliminate this complexity
 and do the migration directly inside schedule()...

 BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in
 the gatekeeper? What would happen if we included it (state ==
 (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))?
 
 I would tend to think that what we should check is
 xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible
 state and the XNATOMIC info bit.

Actually, neither the info bits nor the task state is sufficiently
synchronized against the gatekeeper yet. We need to hold a shared lock
when testing and resetting the state. I'm not sure yet if that is
fixable given the gatekeeper architecture.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 17:48, Philippe Gerum wrote:
 On Tue, 2011-07-12 at 14:57 +0200, Jan Kiszka wrote:
 On 2011-07-12 14:13, Jan Kiszka wrote:
 On 2011-07-12 14:06, Gilles Chanteperdrix wrote:
 On 07/12/2011 01:58 PM, Jan Kiszka wrote:
 On 2011-07-12 13:56, Jan Kiszka wrote:
 However, this parallel unsynchronized execution of the gatekeeper and
 its target thread leaves an increasingly bad feeling on my side. Did we
 really catch all corner cases now? I wouldn't guarantee that yet.
 Specifically as I still have an obscure crash of a Xenomai thread on
 Linux schedule() on my table.

 What if the target thread woke up due to a signal, continued much
 further on a different CPU, blocked in TASK_INTERRUPTIBLE, and then the
 gatekeeper continued? I wish we could already eliminate this complexity
 and do the migration directly inside schedule()...

 BTW, we do we mask out TASK_ATOMICSWITCH when checking the task state in
 the gatekeeper? What would happen if we included it (state ==
 (TASK_ATOMICSWITCH | TASK_INTERRUPTIBLE))?

 I would tend to think that what we should check is
 xnthread_test_info(XNATOMIC). Or maybe check both, the interruptible
 state and the XNATOMIC info bit.

 Actually, neither the info bits nor the task state is sufficiently
 synchronized against the gatekeeper yet. We need to hold a shared lock
 when testing and resetting the state. I'm not sure yet if that is
 fixable given the gatekeeper architecture.


 This may work (on top of the exit-race fix):

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index 50dcf43..90feb16 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -913,20 +913,27 @@ static int gatekeeper_thread(void *data)
  if ((xnthread_user_task(target)-state  ~TASK_ATOMICSWITCH) == 
 TASK_INTERRUPTIBLE) {
  rpi_pop(target);
  xnlock_get_irqsave(nklock, s);
 -#ifdef CONFIG_SMP
 +
  /*
 - * If the task changed its CPU while in
 - * secondary mode, change the CPU of the
 - * underlying Xenomai shadow too. We do not
 - * migrate the thread timers here, it would
 - * not work. For a full migration comprising
 - * timers, using xnpod_migrate_thread is
 - * required.
 + * Recheck XNATOMIC to avoid waking the shadow if the
 + * Linux task received a signal meanwhile.
   */
 -if (target-sched != sched)
 -xnsched_migrate_passive(target, sched);
 +if (xnthread_test_info(target, XNATOMIC)) {
 +#ifdef CONFIG_SMP
 +/*
 + * If the task changed its CPU while in
 + * secondary mode, change the CPU of the
 + * underlying Xenomai shadow too. We do not
 + * migrate the thread timers here, it would
 + * not work. For a full migration comprising
 + * timers, using xnpod_migrate_thread is
 + * required.
 + */
 +if (target-sched != sched)
 +xnsched_migrate_passive(target, sched);
  #endif /* CONFIG_SMP */
 -xnpod_resume_thread(target, XNRELAX);
 +xnpod_resume_thread(target, XNRELAX);
 +}
  xnlock_put_irqrestore(nklock, s);
  xnpod_schedule();
  }
 @@ -1036,6 +1043,7 @@ redo:
   * to process this signal anyway.
   */
  if (rthal_current_domain == rthal_root_domain) {
 +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
  if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
  || this_task-state != TASK_RUNNING))
  xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
  return -ERESTARTSYS;
  }
  
 +xnthread_clear_info(thread, XNATOMIC);
 +
  /* current is now running into the Xenomai domain. */
  thread-gksched = NULL;
  sched = xnsched_finish_unlocked_switch(thread-sched);
 @@ -2650,6 +2660,8 @@ static inline void do_sigwake_event(struct task_struct 
 *p)
  
  xnlock_get_irqsave(nklock, s);
  
 +xnthread_clear_info(thread, XNATOMIC);
 +
  if ((p-ptrace  PT_PTRACED)  !xnthread_test_state(thread, XNDEBUG)) {
  sigset_t pending;
  

 It totally ignores RPI and PREEMPT_RT for now. RPI is broken anyway,
 
 I want to drop RPI in v3 for sure because it is misleading people. I'm
 still pondering whether we should do that earlier during the 2.6
 timeframe.

That would only leave us with XNATOMIC being used under PREEMPT-RT for
signaling LO_GKWAKE_REQ on schedule out while my patch may clear it on
signal

Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
 On 07/12/2011 02:57 PM, Jan Kiszka wrote:
  xnlock_put_irqrestore(nklock, s);
  xnpod_schedule();
  }
 @@ -1036,6 +1043,7 @@ redo:
   * to process this signal anyway.
   */
  if (rthal_current_domain == rthal_root_domain) {
 +XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));
 
 Misleading dead code again, XNATOMIC is cleared not ten lines above.

Nope, I forgot to remove that line.

 
  if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
  || this_task-state != TASK_RUNNING))
  xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
  return -ERESTARTSYS;
  }
  
 +xnthread_clear_info(thread, XNATOMIC);
 
 Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
 place at the point it currently is.

Nope. Now we either clear XNATOMIC after successful migration or when
the signal is about to be sent (ie. in the hook). That way we can test
more reliably (TM) in the gatekeeper if the thread can be migrated.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-12 Thread Jan Kiszka
On 2011-07-12 19:38, Gilles Chanteperdrix wrote:
 On 07/12/2011 07:34 PM, Jan Kiszka wrote:
 On 2011-07-12 19:31, Gilles Chanteperdrix wrote:
 On 07/12/2011 02:57 PM, Jan Kiszka wrote:
xnlock_put_irqrestore(nklock, s);
xnpod_schedule();
}
 @@ -1036,6 +1043,7 @@ redo:
 * to process this signal anyway.
 */
if (rthal_current_domain == rthal_root_domain) {
 +  XENO_BUGON(NUCLEUS, xnthread_test_info(thread, XNATOMIC));

 Misleading dead code again, XNATOMIC is cleared not ten lines above.

 Nope, I forgot to remove that line.


if (XENO_DEBUG(NUCLEUS)  (!signal_pending(this_task)
|| this_task-state != TASK_RUNNING))
xnpod_fatal
 @@ -1044,6 +1052,8 @@ redo:
return -ERESTARTSYS;
}
  
 +  xnthread_clear_info(thread, XNATOMIC);

 Why this? I find the xnthread_clear_info(XNATOMIC) right at the right
 place at the point it currently is.

 Nope. Now we either clear XNATOMIC after successful migration or when
 the signal is about to be sent (ie. in the hook). That way we can test
 more reliably (TM) in the gatekeeper if the thread can be migrated.
 
 Ok for adding the XNATOMIC test, because it improves the robustness, but
 why changing the way XNATOMIC is set and clear? Chances of breaking
 thing while changing code in this area are really high...

The current code is (most probably) broken as it does not properly
synchronizes the gatekeeper against a signaled and runaway target
Linux task.

We need an indication if a Linux signal will (or already has) woken up
the to-be-migrated task. That task may have continued over its context,
potentially on a different CPU. Providing this indication is the purpose
of changing where XNATOMIC is cleared.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-11 Thread Jan Kiszka
On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
  magic = xnthread_get_magic(thread);
  
  xnlock_get_irqsave(nklock, s);
 +
 +gksched = thread-gksched;
 +if (gksched) {
 +xnlock_put_irqrestore(nklock, s);
 
 Are we sure irqs are on here? Are you sure that what is needed is not an
 xnlock_clear_irqon?

We are in the context of do_exit. Not only IRQs are on, also preemption.
And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?

The gatekeeper holds the gksync token while it's active. We request it,
thus we wait for the gatekeeper to become idle again. While it is idle,
we reset the queued reference - but I just realized that this may tramp
on other tasks' values. I need to add a check that the value to be
null'ified is actually still ours.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-11 Thread Jan Kiszka
On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
 magic = xnthread_get_magic(thread);
  
 xnlock_get_irqsave(nklock, s);
 +
 +   gksched = thread-gksched;
 +   if (gksched) {
 +   xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is not an
 xnlock_clear_irqon?
 
 We are in the context of do_exit. Not only IRQs are on, also preemption.
 And surely no nklock is held.
 
 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?
 
 The gatekeeper holds the gksync token while it's active. We request it,
 thus we wait for the gatekeeper to become idle again. While it is idle,
 we reset the queued reference - but I just realized that this may tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

Thinking again, that's actually not a problem: gktarget is only needed
while gksync is zero - but then we won't get hold of it anyway and,
thus, can't cause any damage.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-11 Thread Jan Kiszka
On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
   magic = xnthread_get_magic(thread);
  
   xnlock_get_irqsave(nklock, s);
 +
 + gksched = thread-gksched;
 + if (gksched) {
 + xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We request it,
 thus we wait for the gatekeeper to become idle again. While it is idle,
 we reset the queued reference - but I just realized that this may tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.
 
 Well, you make it look like it does not work. From what I understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for the
 harden to be complete, so setting to NULL is useless. Or am I missing
 something else?

Setting to NULL is probably unneeded but still better than rely on the
gatekeeper never waking up spuriously and then dereferencing a stale
pointer.

The key element of this fix is waitng on gksync, thus on the completion
of the non-RT part of the hardening. Actually, this part usually fails
as the target task received a termination signal at this point.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-11 Thread Jan Kiszka
On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
 magic = xnthread_get_magic(thread);
  
 xnlock_get_irqsave(nklock, s);
 +
 +   gksched = thread-gksched;
 +   if (gksched) {
 +   xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is not an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We request it,
 thus we wait for the gatekeeper to become idle again. While it is idle,
 we reset the queued reference - but I just realized that this may tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for the
 harden to be complete, so setting to NULL is useless. Or am I missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on the
 gatekeeper never waking up spuriously and then dereferencing a stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the completion
 of the non-RT part of the hardening. Actually, this part usually fails
 as the target task received a termination signal at this point.
 
 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, because at
 this point the cleanup code will still be waiting for the semaphore.

Yes, except we will ever wake up the gatekeeper later on without an
updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
code anyway (hopefully obsolete one day).

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix race between gatekeeper and thread deletion

2011-07-11 Thread Jan Kiszka
On 2011-07-11 22:09, Gilles Chanteperdrix wrote:
 On 07/11/2011 10:06 PM, Jan Kiszka wrote:
 On 2011-07-11 22:02, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:59 PM, Jan Kiszka wrote:
 On 2011-07-11 21:51, Gilles Chanteperdrix wrote:
 On 07/11/2011 09:16 PM, Jan Kiszka wrote:
 On 2011-07-11 21:10, Jan Kiszka wrote:
 On 2011-07-11 20:53, Gilles Chanteperdrix wrote:
 On 07/08/2011 06:29 PM, GIT version control wrote:
 @@ -2528,6 +2534,22 @@ static inline void do_taskexit_event(struct 
 task_struct *p)
   magic = xnthread_get_magic(thread);
  
   xnlock_get_irqsave(nklock, s);
 +
 + gksched = thread-gksched;
 + if (gksched) {
 + xnlock_put_irqrestore(nklock, s);

 Are we sure irqs are on here? Are you sure that what is needed is not 
 an
 xnlock_clear_irqon?

 We are in the context of do_exit. Not only IRQs are on, also preemption.
 And surely no nklock is held.

 Furthermore, I do not understand how we
 synchronize with the gatekeeper, how is the gatekeeper garanteed to
 wait for this assignment?

 The gatekeeper holds the gksync token while it's active. We request it,
 thus we wait for the gatekeeper to become idle again. While it is idle,
 we reset the queued reference - but I just realized that this may tramp
 on other tasks' values. I need to add a check that the value to be
 null'ified is actually still ours.

 Thinking again, that's actually not a problem: gktarget is only needed
 while gksync is zero - but then we won't get hold of it anyway and,
 thus, can't cause any damage.

 Well, you make it look like it does not work. From what I understand,
 what you want is to set gktarget to null if a task being hardened is
 destroyed. But by waiting for the semaphore, you actually wait for the
 harden to be complete, so setting to NULL is useless. Or am I missing
 something else?

 Setting to NULL is probably unneeded but still better than rely on the
 gatekeeper never waking up spuriously and then dereferencing a stale
 pointer.

 The key element of this fix is waitng on gksync, thus on the completion
 of the non-RT part of the hardening. Actually, this part usually fails
 as the target task received a termination signal at this point.

 Yes, but since you wait on the completion of the hardening, the test
 if (target ...) in the gatekeeper code will always be true, because at
 this point the cleanup code will still be waiting for the semaphore.

 Yes, except we will ever wake up the gatekeeper later on without an
 updated gktarget, ie. spuriously. Better safe than sorry, this is hairy
 code anyway (hopefully obsolete one day).
 
 The gatekeeper is not woken up by posting the semaphore, the gatekeeper
 is woken up by the thread which is going to be hardened (and this thread
 is the one which waits for the semaphore).

All true. And what is the point?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH 1/2] native: Do not acquire non-existent MPS fastlock

2011-06-30 Thread Jan Kiszka
Fix a build warning at this chance as well.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Applies on top of xenomai-gch.git gch/u_mode

 ksrc/skins/native/task.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c
index da4e626..af62d56 100644
--- a/ksrc/skins/native/task.c
+++ b/ksrc/skins/native/task.c
@@ -256,7 +256,9 @@ void __native_task_pkg_cleanup(void)
 int rt_task_create(RT_TASK *task,
   const char *name, int stksize, int prio, int mode)
 {
+#if defined(CONFIG_XENO_OPT_NATIVE_MPS)  defined(CONFIG_XENO_FASTSYNCH)
xnarch_atomic_t *fastlock = NULL;
+#endif
union xnsched_policy_param param;
struct xnthread_init_attr attr;
int err = 0, cpumask, cpu;
@@ -336,8 +338,11 @@ int rt_task_create(RT_TASK *task,
xnheap_free(xnsys_ppd_get(0)-sem_heap, fastlock);
 #endif
xnpod_delete_thread(task-thread_base);
-   } else
+   }
+#if defined(CONFIG_XENO_OPT_NATIVE_MPS)  defined(CONFIG_XENO_FASTSYNCH)
+   else
xnsynch_fast_acquire(fastlock, 
xnthread_handle(task-thread_base));
+#endif
 
return err;
 }
-- 
1.7.1

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH 2/2] native: Fix error cleanup of rt_task_create

2011-06-30 Thread Jan Kiszka
When creating of a shadow task fails, rt_task_create has to free the
task object consistently, not only on registry errors. Then we need to
delete the core thread when fastlock allocation failed. Moreover, fix a
double free of the fastlock object which is now released via the delete
hook. Finally, avoid a use-after-release of the fastlock object in
__task_delete_hook.

This fixes heap corruptions when running out of resources.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Applies on top of xenomai-gch.git gch/u_mode

 ksrc/skins/native/syscall.c |4 ++-
 ksrc/skins/native/task.c|   44 --
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/ksrc/skins/native/syscall.c b/ksrc/skins/native/syscall.c
index 17071da..b21705a 100644
--- a/ksrc/skins/native/syscall.c
+++ b/ksrc/skins/native/syscall.c
@@ -184,8 +184,10 @@ static int __rt_task_create(struct pt_regs *regs)
   the platform does not support it. */
 
err = rt_task_create(task, name, 0, prio, XNFPU | XNSHADOW | mode);
-   if (err)
+   if (err) {
+   task = NULL;
goto fail;
+   }
 
/* Apply CPU affinity */
set_cpus_allowed(p, task-affinity);
diff --git a/ksrc/skins/native/task.c b/ksrc/skins/native/task.c
index af62d56..c6e9a22 100644
--- a/ksrc/skins/native/task.c
+++ b/ksrc/skins/native/task.c
@@ -74,16 +74,14 @@ static void __task_delete_hook(xnthread_t *thread)
 
task = thread2rtask(thread);
 
-#if defined(CONFIG_XENO_OPT_NATIVE_MPS)  defined(CONFIG_XENO_FASTSYNCH)
-   xnheap_free(xnsys_ppd_get(0)-sem_heap,
-   task-msendq.fastlock);
-#endif
-
 #ifdef CONFIG_XENO_OPT_NATIVE_MPS
/* The nucleus will reschedule as needed when all the deletion
   hooks are done. */
xnsynch_destroy(task-mrecv);
xnsynch_destroy(task-msendq);
+#ifdef CONFIG_XENO_FASTSYNCH
+   xnheap_free(xnsys_ppd_get(0)-sem_heap, task-msendq.fastlock);
+#endif
 #endif /* CONFIG_XENO_OPT_NATIVE_MPS */
 
xnsynch_destroy(task-safesynch);
@@ -265,11 +263,15 @@ int rt_task_create(RT_TASK *task,
xnflags_t bflags;
spl_t s;
 
-   if (prio  T_LOPRIO || prio  T_HIPRIO)
-   return -EINVAL;
+   if (prio  T_LOPRIO || prio  T_HIPRIO) {
+   err = -EINVAL;
+   goto fail;
+   }
 
-   if (xnpod_asynch_p())
-   return -EPERM;
+   if (xnpod_asynch_p()) {
+   err = -EPERM;
+   goto fail;
+   }
 
bflags = mode  (XNFPU | XNSHADOW | XNSUSP);
 
@@ -283,10 +285,10 @@ int rt_task_create(RT_TASK *task,
attr.stacksize = stksize;
param.rt.prio = prio;
 
-   if (xnpod_init_thread(task-thread_base,
- attr, xnsched_class_rt, param) != 0)
-   /* Assume this is the only possible failure. */
-   return -ENOMEM;
+   err = xnpod_init_thread(task-thread_base, attr, xnsched_class_rt,
+   param);
+   if (err)
+   goto fail;
 
inith(task-link);
task-suspend_depth = (bflags  XNSUSP) ? 1 : 0;
@@ -310,8 +312,10 @@ int rt_task_create(RT_TASK *task,
/* Allocate lock memory for in-kernel use */
fastlock = xnheap_alloc(xnsys_ppd_get(0)-sem_heap,
sizeof(*fastlock));
-   if (fastlock == NULL)
+   if (fastlock == NULL) {
+   xnpod_delete_thread(task-thread_base);
return -ENOMEM;
+   }
 #endif /* CONFIG_XENO_FASTSYNCH */
xnsynch_init(task-mrecv, XNSYNCH_FIFO, NULL);
/*
@@ -333,18 +337,20 @@ int rt_task_create(RT_TASK *task,
   half-baked objects... */
 
err = xnthread_register(task-thread_base, name ? task-rname : );
-   if (err) {
-#if defined(CONFIG_XENO_OPT_NATIVE_MPS)  defined(CONFIG_XENO_FASTSYNCH)
-   xnheap_free(xnsys_ppd_get(0)-sem_heap, fastlock);
-#endif
+   if (err)
xnpod_delete_thread(task-thread_base);
-   }
 #if defined(CONFIG_XENO_OPT_NATIVE_MPS)  defined(CONFIG_XENO_FASTSYNCH)
else
xnsynch_fast_acquire(fastlock, 
xnthread_handle(task-thread_base));
 #endif
 
return err;
+
+  fail:
+   if (xnthread_test_state(task-thread_base, XNSHADOW))
+   xnfree(task);
+
+   return err;
 }
 
 /**
-- 
1.7.1

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/2] native: Fix error cleanup of rt_task_create

2011-06-30 Thread Jan Kiszka
On 2011-06-30 13:09, Gilles Chanteperdrix wrote:
 On 06/30/2011 11:36 AM, Jan Kiszka wrote:
 When creating of a shadow task fails, rt_task_create has to free the
 task object consistently, not only on registry errors. Then we need to
 delete the core thread when fastlock allocation failed. Moreover, fix a
 double free of the fastlock object which is now released via the delete
 hook. Finally, avoid a use-after-release of the fastlock object in
 __task_delete_hook.

 This fixes heap corruptions when running out of resources.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 (...)
 +
 +  fail:
 +if (xnthread_test_state(task-thread_base, XNSHADOW))
 +xnfree(task);
 +
 +return err;
  }
  
  /**
 
 Is this needed? I mean, shadows are created in syscall.c, function
 __rt_task_create, and when rt_task_create returns an error, that
 function calls rt_task_delete. So, there should be no leak. And worse,
 here rt_task_delete will use an invalid pointer if we apply that patch.

The problem is that rt_task_create may both fail early without any
registration step performed yet and very late when everything (except
the the registry entry) is already set up. There is no chance for the
caller __rt_task_create to figure out the precise task registration
state and properly roll it back. That must be done by rt_task_create.

This patch ensures that shadow tasks are either successfully registered
or completely deleted on error. Among other bug scenarios, this avoids
deleting the task twice, first via xnpod_delete_thread+deletion hook and
then again via xnfree in the error path of __rt_task_create.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context

2011-06-29 Thread Jan Kiszka
On 2011-06-28 23:29, Gilles Chanteperdrix wrote:
 On 06/28/2011 11:01 PM, GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5597470d84584846875e8a35309e6302c768addf
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Tue Jun 28 22:10:07 2011 +0200

 nucleus: Allow drop_u_mode syscall from any context

 xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It
 does that without issuing a warning message if the check fails - in
 contrast to do_hisyscall_event. As user space may call this cleanup
 service even for non-shadow threads (e.g. after shadow creation failed),
 we better silence this warning.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 Jan,
 
 I have a branch here, which allocates u_mode in the shared heap, so,
 this syscall is about to become unnecessary.
 
 See:
 http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode

Even better. When do you plan to merge all this? I'd like to finally fix
the various MPS fastlock breakages, specifically as they overlap with
other issues there.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Allow drop_u_mode syscall from any context

2011-06-29 Thread Jan Kiszka
On 2011-06-29 09:25, Gilles Chanteperdrix wrote:
 On 06/29/2011 09:06 AM, Jan Kiszka wrote:
 On 2011-06-28 23:29, Gilles Chanteperdrix wrote:
 On 06/28/2011 11:01 PM, GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5597470d84584846875e8a35309e6302c768addf
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5597470d84584846875e8a35309e6302c768addf

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Tue Jun 28 22:10:07 2011 +0200

 nucleus: Allow drop_u_mode syscall from any context

 xnshadow_sys_drop_u_mode already checks if the caller is a shadow. It
 does that without issuing a warning message if the check fails - in
 contrast to do_hisyscall_event. As user space may call this cleanup
 service even for non-shadow threads (e.g. after shadow creation failed),
 we better silence this warning.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 Jan,

 I have a branch here, which allocates u_mode in the shared heap, so,
 this syscall is about to become unnecessary.

 See:
 http://git.xenomai.org/?p=xenomai-gch.git;a=shortlog;h=refs/heads/u_mode

 Even better. When do you plan to merge all this? I'd like to finally fix
 the various MPS fastlock breakages, specifically as they overlap with
 other issues there.
 
 I would like to give a chance to Philippe to have a look at it before it
 is merged (especially the commit using an adeos ptd).

OK.

 
 Normally, the MPS fastlock issues are solved in this branch.

Only the previously discussed leak. MSP-disabled is still oopsing, and
error clean up is also broken - but not only for MPS.

I'll base my fixes on top of your branch.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] posix: only use rt_printf when in primary mode

2011-06-28 Thread Jan Kiszka
Hi Gilles,

dynamic printf backend selection is a suboptimal idea when you want
chronological output (not that uncommon during debugging). The native
printf may actually hit the screen (or the log file) before some earlier
queued rt_printf message because the dump thread is busy or has a lower
priority.

It's ok to do native printf for non-shadows. But once a thread was
mapped, printf needs to go through the rt queue, even in secondary mode.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] posix: only use rt_printf when in primary mode

2011-06-28 Thread Jan Kiszka
On 2011-06-28 22:48, Jan Kiszka wrote:
 Hi Gilles,
 
 dynamic printf backend selection is a suboptimal idea when you want
 chronological output (not that uncommon during debugging). The native
 printf may actually hit the screen (or the log file) before some earlier
 queued rt_printf message because the dump thread is busy or has a lower
 priority.

Oh, I missed the buffer flush - that should work.

Sorry for the noise.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] RFC: use a pool of pre-allocated buffers in rt_printf

2011-06-27 Thread Jan Kiszka
On 2011-06-24 21:58, Gilles Chanteperdrix wrote:
 On 06/23/2011 01:36 PM, Jan Kiszka wrote:
 On 2011-06-23 13:29, Gilles Chanteperdrix wrote:
 On 06/23/2011 11:09 AM, Jan Kiszka wrote:
 On 2011-06-23 11:05, Gilles Chanteperdrix wrote:
 On 06/23/2011 09:38 AM, Jan Kiszka wrote:
 +#ifdef CONFIG_XENO_FASTSYNCH
 +   if (xeno_get_current() != XN_NO_HANDLE 
 +   !(xeno_get_current_mode()  XNRELAX)  buf_pool_get()) {
 +   struct print_buffer *old;
 +
 +   old = buf_pool_get();
 +   while (old != buffer) {
 +   buffer = old;
 +   old = buf_pool_cmpxchg(buffer, 
 buffer-pool_next);

 Though unlikely, it's still possible: The buffer obtained in the last
 round may have been dequeued meanwhile and then freed (in case a third
 buffer was released to the pool, filling it up to pool_capacity).

 I do not get it: it seems to me that if the current head (that is
 buf_pool_get()) is freed, then the cmpxchg will fail, so we will loop
 and try again.

 Problematic is the dereferencing of the stale buffer pointer obtained
 during the last cmpxchg or via buf_pool_get. This happens before the new
 cmpxchg.

 Ok. Got it, that would be a problem only if the stale pointer pointed to
 an unmapped area, but ok, better avoid freeing the buffers. I guess it
 would not be a problem as applications tend to have a fixed number of
 threads anyway.

 That is a risky assumption, proven wrong e.g. by one of our
 applications. Threads may always be created or destroyed depending on
 the mode of an application, specifically if it is a larger one.
 
 Here is another attempt, based on a bitmap.
 
 diff --git a/src/rtdk/rt_print.c b/src/rtdk/rt_print.c
 index 93b711a..fb4820f 100644
 --- a/src/rtdk/rt_print.c
 +++ b/src/rtdk/rt_print.c
 @@ -28,7 +28,9 @@
  #include syslog.h
  
  #include rtdk.h
 +#include nucleus/types.h   /* For BITS_PER_LONG */
  #include asm/xenomai/system.h
 +#include asm/xenomai/atomic.h  /* For atomic_cmpxchg */
  #include asm-generic/stack.h
  
  #define RT_PRINT_BUFFER_ENV  RT_PRINT_BUFFER
 @@ -37,6 +39,9 @@
  #define RT_PRINT_PERIOD_ENV  RT_PRINT_PERIOD
  #define RT_PRINT_DEFAULT_PERIOD  100 /* ms */
  
 +#define RT_PRINT_BUFFERS_COUNT_ENV  RT_PRINT_BUFFERS_COUNT
 +#define RT_PRINT_DEFAULT_BUFFERS_COUNT  4
 +
  #define RT_PRINT_LINE_BREAK  256
  
  #define RT_PRINT_SYSLOG_STREAM   NULL
 @@ -75,6 +80,12 @@ static pthread_mutex_t buffer_lock;
  static pthread_cond_t printer_wakeup;
  static pthread_key_t buffer_key;
  static pthread_t printer_thread;
 +#ifdef CONFIG_XENO_FASTSYNCH
 +static xnarch_atomic_t *pool_bitmap;
 +static unsigned pool_bitmap_len;
 +static unsigned pool_buf_size;
 +static unsigned long pool_start, pool_len;
 +#endif /* CONFIG_XENO_FASTSYNCH */
  
  static void cleanup_buffer(struct print_buffer *buffer);
  static void print_buffers(void);
 @@ -243,10 +254,36 @@ static void set_buffer_name(struct print_buffer 
 *buffer, const char *name)
   }
  }
  
 +void rt_print_init_inner(struct print_buffer *buffer, size_t size)
 +{
 + buffer-size = size;
 +
 + memset(buffer-ring, 0, size);
 +
 + buffer-read_pos  = 0;
 + buffer-write_pos = 0;
 +
 + buffer-prev = NULL;
 +
 + pthread_mutex_lock(buffer_lock);
 +
 + buffer-next = first_buffer;
 + if (first_buffer)
 + first_buffer-prev = buffer;
 + first_buffer = buffer;
 +
 + buffers++;
 + pthread_cond_signal(printer_wakeup);
 +
 + pthread_mutex_unlock(buffer_lock);
 +}
 +
  int rt_print_init(size_t buffer_size, const char *buffer_name)
  {
   struct print_buffer *buffer = pthread_getspecific(buffer_key);
   size_t size = buffer_size;
 + unsigned long old_bitmap;
 + unsigned j;
  
   if (!size)
   size = default_buffer_size;
 @@ -260,39 +297,56 @@ int rt_print_init(size_t buffer_size, const char 
 *buffer_name)
   return 0;
   }
   cleanup_buffer(buffer);
 + buffer = NULL;
   }
  
 - buffer = malloc(sizeof(*buffer));
 - if (!buffer)
 - return ENOMEM;
 +#ifdef CONFIG_XENO_FASTSYNCH
 + /* Find a free buffer in the pool */
 + do {
 + unsigned long bitmap;
 + unsigned i;
  
 - buffer-ring = malloc(size);
 - if (!buffer-ring) {
 - free(buffer);
 - return ENOMEM;
 - }
 - memset(buffer-ring, 0, size);
 + for (i = 0; i  pool_bitmap_len; i++) {
 + old_bitmap = xnarch_atomic_get(pool_bitmap[i]);
 + if (old_bitmap)
 + goto acquire;
 + }
  
 - buffer-read_pos  = 0;
 - buffer-write_pos = 0;
 + goto not_found;
  
 - buffer-size = size;
 +   acquire:
 + do {
 + bitmap = old_bitmap;
 + j = __builtin_ffsl(bitmap) - 1

Re: [Xenomai-core] RFC: use a pool of pre-allocated buffers in rt_printf

2011-06-23 Thread Jan Kiszka
On 2011-06-22 23:55, Gilles Chanteperdrix wrote:
 
 Hi,
 
 I would like to better integrate rtdk and the posix skin by forcibly 
 wrapping the calls to malloc/free and also wrap printf to call 
 rt_printf.
 
 However, currently, rt_printf can not really be used as a drop-in 
 replacement of printf:
 - it is necessary to call rt_printf_auto_init, we can do it in the 
 library init code;
 - the first rt_printf causes a memory allocation, so a potential switch 
 to secondary mode, which is what using rt_printf was trying to avoid in
 the first place.
 
 In order to solve this second issue, and to avoid forcibly creating a 
 buffer for each thread, which would be wasteful, here is a patch, which, 
 following an idea of Philippe, creates a pool of pre-allocated buffers. 
 The pool is handled in a lockless fashion, using xnarch_atomic_cmpxchg 
 (so, will only work with CONFIG_XENO_FASTSYNCH).

Makes sense.

 
 diff --git a/src/rtdk/rt_print.c b/src/rtdk/rt_print.c
 index 93b711a..2ed2209 100644
 --- a/src/rtdk/rt_print.c
 +++ b/src/rtdk/rt_print.c
 @@ -30,6 +30,8 @@
  #include rtdk.h
  #include asm/xenomai/system.h
  #include asm-generic/stack.h
 +#include asm/xenomai/atomic.h
 +#include asm-generic/bits/current.h
  
  #define RT_PRINT_BUFFER_ENV  RT_PRINT_BUFFER
  #define RT_PRINT_DEFAULT_BUFFER  16*1024
 @@ -37,6 +39,9 @@
  #define RT_PRINT_PERIOD_ENV  RT_PRINT_PERIOD
  #define RT_PRINT_DEFAULT_PERIOD  100 /* ms */
  
 +#define RT_PRINT_BUFFERS_COUNT_ENV  RT_PRINT_BUFFERS_COUNT
 +#define RT_PRINT_DEFAULT_BUFFERS_COUNT  4
 +
  #define RT_PRINT_LINE_BREAK  256
  
  #define RT_PRINT_SYSLOG_STREAM   NULL
 @@ -63,6 +68,9 @@ struct print_buffer {
* caching on SMP.
*/
   off_t read_pos;
 +#ifdef CONFIG_XENO_FASTSYNCH
 + struct print_buffer *pool_next;
 +#endif /* CONFIG_XENO_FASTSYNCH */
  };
  
  static struct print_buffer *first_buffer;
 @@ -75,6 +83,17 @@ static pthread_mutex_t buffer_lock;
  static pthread_cond_t printer_wakeup;
  static pthread_key_t buffer_key;
  static pthread_t printer_thread;
 +#ifdef CONFIG_XENO_FASTSYNCH
 +static xnarch_atomic_t buffer_pool;
 +static unsigned pool_capacity;
 +static xnarch_atomic_t pool_count;
 +
 +#define buf_pool_set(buf) xnarch_atomic_set(buffer_pool, (unsigned long)buf)
 +#define buf_pool_get() (struct print_buffer *)xnarch_atomic_get(buffer_pool)
 +#define buf_pool_cmpxchg(oldval, newval) \
 + ((struct print_buffer *)xnarch_atomic_cmpxchg   \
 +  (buffer_pool, (unsigned long)oldval, (unsigned long)newval))

static inlines should work as well and document input/output types a bit
better.

 +#endif /* CONFIG_XENO_FASTSYNCH */
  
  static void cleanup_buffer(struct print_buffer *buffer);
  static void print_buffers(void);
 @@ -243,43 +262,28 @@ static void set_buffer_name(struct print_buffer 
 *buffer, const char *name)
   }
  }
  
 -int rt_print_init(size_t buffer_size, const char *buffer_name)
 +static struct print_buffer *rt_print_init_inner(size_t size)
  {
 - struct print_buffer *buffer = pthread_getspecific(buffer_key);
 - size_t size = buffer_size;
 -
 - if (!size)
 - size = default_buffer_size;
 - else if (size  RT_PRINT_LINE_BREAK)
 - return EINVAL;
 + struct print_buffer *buffer;
  
 - if (buffer) {
 - /* Only set name if buffer size is unchanged or default */
 - if (size == buffer-size || !buffer_size) {
 - set_buffer_name(buffer, buffer_name);
 - return 0;
 - }
 - cleanup_buffer(buffer);
 - }
 + assert_nrt();
  
   buffer = malloc(sizeof(*buffer));
   if (!buffer)
 - return ENOMEM;
 + return NULL;
  
   buffer-ring = malloc(size);
   if (!buffer-ring) {
   free(buffer);
 - return ENOMEM;
 + return NULL;
   }
 + buffer-size = size;
 +
   memset(buffer-ring, 0, size);
  
   buffer-read_pos  = 0;
   buffer-write_pos = 0;
  
 - buffer-size = size;
 -
 - set_buffer_name(buffer, buffer_name);
 -
   buffer-prev = NULL;
  
   pthread_mutex_lock(buffer_lock);
 @@ -294,6 +298,52 @@ int rt_print_init(size_t buffer_size, const char 
 *buffer_name)
  
   pthread_mutex_unlock(buffer_lock);
  
 + return buffer;
 +}
 +
 +int rt_print_init(size_t buffer_size, const char *buffer_name)
 +{
 + struct print_buffer *buffer = pthread_getspecific(buffer_key);
 + size_t size = buffer_size;
 +
 + if (!size)
 + size = default_buffer_size;
 + else if (size  RT_PRINT_LINE_BREAK)
 + return EINVAL;
 +
 + if (buffer) {
 + /* Only set name if buffer size is unchanged or default */
 + if (size == buffer-size || !buffer_size) {
 + set_buffer_name(buffer, buffer_name);
 + return 0;
 + }
 + 

Re: [Xenomai-core] RFC: use a pool of pre-allocated buffers in rt_printf

2011-06-23 Thread Jan Kiszka
On 2011-06-23 11:05, Gilles Chanteperdrix wrote:
 On 06/23/2011 09:38 AM, Jan Kiszka wrote:
 +#ifdef CONFIG_XENO_FASTSYNCH
 +   if (xeno_get_current() != XN_NO_HANDLE 
 +   !(xeno_get_current_mode()  XNRELAX)  buf_pool_get()) {
 +   struct print_buffer *old;
 +
 +   old = buf_pool_get();
 +   while (old != buffer) {
 +   buffer = old;
 +   old = buf_pool_cmpxchg(buffer, buffer-pool_next);

 Though unlikely, it's still possible: The buffer obtained in the last
 round may have been dequeued meanwhile and then freed (in case a third
 buffer was released to the pool, filling it up to pool_capacity).
 
 I do not get it: it seems to me that if the current head (that is
 buf_pool_get()) is freed, then the cmpxchg will fail, so we will loop
 and try again.

Problematic is the dereferencing of the stale buffer pointer obtained
during the last cmpxchg or via buf_pool_get. This happens before the new
cmpxchg.

 
 @@ -346,12 +396,36 @@ static void cleanup_buffer(struct print_buffer 
 *buffer)
  {
 struct print_buffer *prev, *next;
  
 +   assert_nrt();
 +
 pthread_setspecific(buffer_key, NULL);
  
 pthread_mutex_lock(buffer_lock);
  
 print_buffers();

 You also need to unhook the buffer from the active list before returning
 it to the pool.
 
 We can not do that: we want that when rt_print_init takes a buffer from
 the pool, it does not need to do any operation which would need to
 switch to secondary mode. This includes getting the active list mutex.
 So, the buffer in the pool are also in the active list, but they remain
 empty, so, nothing will be printed.

Ah, I see.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-23 Thread Jan Kiszka
On 2011-06-20 19:07, Jan Kiszka wrote:
 On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
 On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
 On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
 I am working on this ppd cleanup issue again, I am asking for help to
 find a fix in -head for all cases where the sys_ppd is needed during
 some cleanup.

 The problem is that when the ppd cleanup is invoked:
 - we have no guarantee that current is a thread from the Xenomai
 application;
 - if it is, current-mm is NULL.

 So, associating the sys_ppd to either current or current-mm does not
 work. What we could do is pass the sys_ppd to all the other ppds cleanup
 handlers, this would fix cases such as freeing mutexes fastlock, but
 that does not help when the sys_ppd is needed during a thread deletion 
 hook.

 I would like to find a solution where simply calling xnsys_ppd_get()
 will work, where we do not have an xnsys_ppd_get for each context, such
 as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
 because it would be too error-prone.

 Any idea anyone?

 The best I could come up with: use a ptd to store the mm currently 
 being cleaned up, so that xnshadow_ppd_get continues to work, even
 in the middle of a cleanup.

 In order to also get xnshadow_ppd_get to work in task deletion hooks 
 (which is needed to avoid the issue at the origin of this thread), we 
 also need to set this ptd upon shadow mapping, so it is still there 
 when reaching the task deletion hook (where current-mm may be NULL). 
 Hence the patch:

 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index b243600..6bc4210 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -65,6 +65,11 @@ int nkthrptd;
  EXPORT_SYMBOL_GPL(nkthrptd);
  int nkerrptd;
  EXPORT_SYMBOL_GPL(nkerrptd);
 +int nkmmptd;
 +EXPORT_SYMBOL_GPL(nkmmptd);
 +
 +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd])
 +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))
 
 xnshadow_mm() can now return a no longer existing mm. So no user of
 xnshadow_mm should ever dereference that pointer. Thus we better change
 all that user to treat the return value as a void pointer e.g.
 
  
  struct xnskin_slot {
  struct xnskin_props *props;
 @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t 
 __user *u_completion,
   * friends.
   */
  xnshadow_thrptd(current) = thread;
 +xnshadow_mmptd(current) = current-mm;
 +
  rthal_enable_notifier(current);
  
  if (xnthread_base_priority(thread) == 0 
 @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
  
  static inline void do_cleanup_event(struct mm_struct *mm)
  {
 +struct task_struct *p = current;
 +struct mm_struct *old;
 +
 +old = xnshadow_mm(p);
 +xnshadow_mmptd(p) = mm;
 +
  ppd_remove_mm(mm, detach_ppd);
 +
 +xnshadow_mmptd(p) = old;
 
 I don't have the full picture yet, but that feels racy: If the context
 over which we clean up that foreign mm is also using xnshadow_mmptd,
 other threads in that process may dislike this temporary change.
 
  }
  
  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
 @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
  {
  if (xnpod_userspace_p())
 -return ppd_lookup(muxid, current-mm);
 +return ppd_lookup(muxid, xnshadow_mm(current) ?: current-mm);
  
  return NULL;
  }
 @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
  sema_init(completion_mutex, 1);
  nkthrptd = rthal_alloc_ptdkey();
  nkerrptd = rthal_alloc_ptdkey();
 +nkmmptd = rthal_alloc_ptdkey();
  
 -if (nkthrptd  0 || nkerrptd  0) {
 +if (nkthrptd  0 || nkerrptd  0 || nkmmptd  0) {
  printk(KERN_ERR Xenomai: cannot allocate PTD slots\n);
  return -ENOMEM;
  }
 diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
 index 6ce75e5..cc86852 100644
 --- a/ksrc/skins/posix/mutex.c
 +++ b/ksrc/skins/posix/mutex.c
 @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
  xnlock_put_irqrestore(nklock, s);
  
  #ifdef CONFIG_XENO_FASTSYNCH
 -/* We call xnheap_free even if the mutex is not pshared; when
 -   this function is called from pse51_mutexq_cleanup, the
 -   sem_heap is destroyed, or not the one to which the fastlock
 -   belongs, xnheap will simply return an error. */
 
 I think this comment is not completely obsolete. It still applies /wrt
 shared/non-shared.
 
  xnheap_free(xnsys_ppd_get(mutex-attr.pshared)-sem_heap,
  mutex-synchbase.fastlock);
  #endif /* CONFIG_XENO_FASTSYNCH */


 
 If we can resolve that potential race, this looks like a nice solution.

We still have to address that ordering issue I almost forgot:
do_cleanup_event runs before do_task_exit_event when terminating the
last task. The former destroys the sem heap, the latter fires the delete
hook which then tries to free

[Xenomai-core] User space stack pre-faulting

2011-06-22 Thread Jan Kiszka
Hi Gilles,

do you remember reasons for only pre-faulting the main thread's stack?
The desire to avoid wasting resources by forcing all stacks into memory?

I've the requirement on my table to provide a generic solution of all
shadow threads. I think this should be possible using pthread_getattr_np
and walking the stack page-wise, but I may miss some pitfall.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] User space stack pre-faulting

2011-06-22 Thread Jan Kiszka
On 2011-06-22 12:36, Gilles Chanteperdrix wrote:
 On 06/22/2011 11:26 AM, Jan Kiszka wrote:
 Hi Gilles,

 do you remember reasons for only pre-faulting the main thread's stack?
 The desire to avoid wasting resources by forcing all stacks into memory?

 I've the requirement on my table to provide a generic solution of all
 shadow threads. I think this should be possible using pthread_getattr_np
 and walking the stack page-wise, but I may miss some pitfall.
 
 Last time I checked, only the main thread stack was mapped on demand.
 Other threads have mmaped stacks, which are made present by mlockall,
 so, do not need faulting.

That's definitely not the case in general. Customer has just confirmed
that pre-faulting thread stacks avoids first-access domain switches.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] User space stack pre-faulting

2011-06-22 Thread Jan Kiszka
On 2011-06-22 12:56, Gilles Chanteperdrix wrote:
 On 06/22/2011 12:55 PM, Jan Kiszka wrote:
 On 2011-06-22 12:36, Gilles Chanteperdrix wrote:
 On 06/22/2011 11:26 AM, Jan Kiszka wrote:
 Hi Gilles,

 do you remember reasons for only pre-faulting the main thread's stack?
 The desire to avoid wasting resources by forcing all stacks into memory?

 I've the requirement on my table to provide a generic solution of all
 shadow threads. I think this should be possible using pthread_getattr_np
 and walking the stack page-wise, but I may miss some pitfall.

 Last time I checked, only the main thread stack was mapped on demand.
 Other threads have mmaped stacks, which are made present by mlockall,
 so, do not need faulting.

 That's definitely not the case in general. Customer has just confirmed
 that pre-faulting thread stacks avoids first-access domain switches.
 
 self-contained test case please.

Yes, will check this. Currently distracted again by a higher prio oops :-/.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

What makes splmax() redundant for the unlocked context switch case? IMO
that bug is still present.

We can clean up xnintr_clock_handler a bit after the changes, will
follow up with a patch.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-20 Thread Jan Kiszka
On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
 On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
 On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
 I am working on this ppd cleanup issue again, I am asking for help to
 find a fix in -head for all cases where the sys_ppd is needed during
 some cleanup.

 The problem is that when the ppd cleanup is invoked:
 - we have no guarantee that current is a thread from the Xenomai
 application;
 - if it is, current-mm is NULL.

 So, associating the sys_ppd to either current or current-mm does not
 work. What we could do is pass the sys_ppd to all the other ppds cleanup
 handlers, this would fix cases such as freeing mutexes fastlock, but
 that does not help when the sys_ppd is needed during a thread deletion hook.

 I would like to find a solution where simply calling xnsys_ppd_get()
 will work, where we do not have an xnsys_ppd_get for each context, such
 as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
 because it would be too error-prone.

 Any idea anyone?

 The best I could come up with: use a ptd to store the mm currently 
 being cleaned up, so that xnshadow_ppd_get continues to work, even
 in the middle of a cleanup.
 
 In order to also get xnshadow_ppd_get to work in task deletion hooks 
 (which is needed to avoid the issue at the origin of this thread), we 
 also need to set this ptd upon shadow mapping, so it is still there 
 when reaching the task deletion hook (where current-mm may be NULL). 
 Hence the patch:
 
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index b243600..6bc4210 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -65,6 +65,11 @@ int nkthrptd;
  EXPORT_SYMBOL_GPL(nkthrptd);
  int nkerrptd;
  EXPORT_SYMBOL_GPL(nkerrptd);
 +int nkmmptd;
 +EXPORT_SYMBOL_GPL(nkmmptd);
 +
 +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd])
 +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))

xnshadow_mm() can now return a no longer existing mm. So no user of
xnshadow_mm should ever dereference that pointer. Thus we better change
all that user to treat the return value as a void pointer e.g.

  
  struct xnskin_slot {
   struct xnskin_props *props;
 @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t 
 __user *u_completion,
* friends.
*/
   xnshadow_thrptd(current) = thread;
 + xnshadow_mmptd(current) = current-mm;
 +
   rthal_enable_notifier(current);
  
   if (xnthread_base_priority(thread) == 0 
 @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
  
  static inline void do_cleanup_event(struct mm_struct *mm)
  {
 + struct task_struct *p = current;
 + struct mm_struct *old;
 +
 + old = xnshadow_mm(p);
 + xnshadow_mmptd(p) = mm;
 +
   ppd_remove_mm(mm, detach_ppd);
 +
 + xnshadow_mmptd(p) = old;

I don't have the full picture yet, but that feels racy: If the context
over which we clean up that foreign mm is also using xnshadow_mmptd,
other threads in that process may dislike this temporary change.

  }
  
  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
 @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
  {
   if (xnpod_userspace_p())
 - return ppd_lookup(muxid, current-mm);
 + return ppd_lookup(muxid, xnshadow_mm(current) ?: current-mm);
  
   return NULL;
  }
 @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
   sema_init(completion_mutex, 1);
   nkthrptd = rthal_alloc_ptdkey();
   nkerrptd = rthal_alloc_ptdkey();
 + nkmmptd = rthal_alloc_ptdkey();
  
 - if (nkthrptd  0 || nkerrptd  0) {
 + if (nkthrptd  0 || nkerrptd  0 || nkmmptd  0) {
   printk(KERN_ERR Xenomai: cannot allocate PTD slots\n);
   return -ENOMEM;
   }
 diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
 index 6ce75e5..cc86852 100644
 --- a/ksrc/skins/posix/mutex.c
 +++ b/ksrc/skins/posix/mutex.c
 @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
   xnlock_put_irqrestore(nklock, s);
  
  #ifdef CONFIG_XENO_FASTSYNCH
 - /* We call xnheap_free even if the mutex is not pshared; when
 -this function is called from pse51_mutexq_cleanup, the
 -sem_heap is destroyed, or not the one to which the fastlock
 -belongs, xnheap will simply return an error. */

I think this comment is not completely obsolete. It still applies /wrt
shared/non-shared.

   xnheap_free(xnsys_ppd_get(mutex-attr.pshared)-sem_heap,
   mutex-synchbase.fastlock);
  #endif /* CONFIG_XENO_FASTSYNCH */
 
 

If we can resolve that potential race, this looks like a nice solution.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.
 
 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.

Then why do we call xnlock_get_irqsave in
xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
off anyway?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


  1   2   3   4   5   6   7   8   9   10   >