Re: [Xenomai-core] [RFC 0/1] Class driver for raw Ethernet packets

2011-09-23 Thread Peter Soetens
On Friday 23 September 2011 13:02:19 Richard Cochran wrote:
 This patch adds a class driver for raw Ethernet drivers under
 Xenomai. The goal is to support industrial protocols such as EtherCAT
 and IEC 61850, where the stack is a user space program needing
 direct access at the packet level. The class driver offers interfaces
 for registration, buffer management, and packet sending/receiving.
 
 Although this patch is a kind of first draft, still I have it working
 on the Freescale P2020 with a real world application, with very good
 results. I can post a patch series for the gianfar driver in the ipipe
 tree, if anyone is interested.
 
 The user space interface is a character device and not a socket, simply
 because my applications will probably never need fancy socket
 options. The class driver could surely be made to offer a socket
 instead, but I think the character is sufficient.
 
 The class driver is clearly in the wrong directory within the source
 tree, but I put it there just to get started. It really does not fit
 to any of the other drivers, so it probably would need its own place
 under ksrc/drivers.
 
 Thanks in advance for your comments,

How does this relate to rtnet, i.e. why didn't you write an rtnet driver ? 

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] Xenomai v2.5.0 -- Important Notice

2010-01-07 Thread Peter Soetens
On Friday 01 January 2010 12:00:52 Philippe Gerum wrote:
 Here is Xenomai 2.5.0.
...
 http://download.gna.org/xenomai/stable/xenomai-2.5.0.tar.bz2
 
 Happy new year, btw.

A Big Thank You from the user community to all contributors, Philippe, Gilles 
and Jan in particular, seems appropriate here, but it hasn't been spoken even 
after a week. Therefor, I'll unilaterally and only for once assume to be the 
spokesman of this world-wide fan club and do the honors:

A Big Thank You, and a happy new year to you too !

Those who accord, Say Hear, Hear, +1 or Me too, depending on their 
geological or social situation.

The Xenomai User Community.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] select: native tasks with posix skin mqueues

2009-12-14 Thread Peter Soetens
On Mon, Nov 30, 2009 at 15:20, Peter Soetens pe...@thesourceworks.com wrote:
 On Thu, Nov 5, 2009 at 02:46, Gilles Chanteperdrix
 gilles.chanteperd...@xenomai.org wrote:

 Peter Soetens wrote:
  Hi,
 
  I'm creating my RT threads using the native API and I'm creating
  mqueues, wrapped to the pthread_rt library.
  I can read and write the mqueue (and it goes through Xenomai), but
  when I select() on a receiving mqd_t, the select() calls returns that
  there is data available on the mq (it fills in the FD_SET), but keeps
  doing so even when it's empty (the select() is in a loop). Also, it's
  modeswitching like nuts.
 
  I found out that the __wrap_select is correctly called, but returns
  -EPERM. Kernel sources indicate that this is caused by
  pse51_current_thread() alias thread2pthread() returning null. Since
  EPERM is returned to userspace, the __real_select is called from user
  space, causing the mode switches and bad behaviour. This is almost
  certainly the thing that native + RTDM + select() is seeing too.
 
  My mqueues-only work probably because mq.c only uses
  pse51_current_thread() in the mq_notify function. I'm guessing that
  mq_notify would also not work in combination with native skin.
 
  I had two options in fixing this: add a xnselector to the native task
  struct or to the nucleus xnthread_t. I choose the latter, such that
  every skin kan use select() + RTDM and migrate gradualy to the RTDM
  and/or Posix skin.
  I needed to free the xnselector structure in xnpod_delete_thread() , I
  chose a spot, but it causes a segfault in my native thread (which did
  the select) during program cleanup. Any advice ? Also, maybe we should
  separate select() from the posix skin and put it in a separate place
  (in RTDM as rtdm_select() ?), such that we can start building around
  it (posix just forwards to rtdm_select() then).
 
  A second patch was necessary to return the timeout case properly to
  userspace (independent of first patch).
 
  Tested with native + posix loaded and mq. If you never quit your
  application, this works :-)

 Hi,

 I have included a lightly modified version of this patch on head, I do
 not see any crash.  However, I have some doubts about the current
 implementation: calling xnselector_destroy() opens opportunities for a
 rescheduling, which I am not sure is really what we want in the middle
 of xnpod_delete_thread(). Philippe, what do you think?

 I'll test this patch this week too. It seems you forgot to apply the
 second patch, which can go straight into the 2.4 head, since it fixes
 a bug in the select() wrapping code when timeouts are used. See
 parent's 0002-posix-Fix-__wrap_select-when-timeout-happens.patch

I tested the 2.5 branch with this patch and it works here too.

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Vague bug report(s) on 2.5-head

2009-12-14 Thread Peter Soetens
Hi guys,

I'm using the master branch, 4f42de74 with Linux 2.6.31.1 and the
adeos patch from that tree. My app links with both -lnative and
-lposix with the wrappers (using xeno-config).

I'm experiencing a segfault within pthread_cancel() when calling
rt_task_delete(task) of the main() thread (so it deletes its own
task), which was init'ed with rt_task_shadow(). When I omit the delete
call, the application terminates cleanly. When the app doesn't link
with posix wrappers, no segfault either.

I didn't have this behaviour 'before' (2.4.10). I don't have crashes
when deleting normal RT-threads created with rt_task_create.

Program received signal SIGSEGV, Segmentation fault.
pthread_cancel (th=1719432289) at pthread_cancel.c:35
35  pthread_cancel.c: No such file or directory.
in pthread_cancel.c
Current language:  auto
The current source language is auto; currently c.
(gdb) bt
#0  pthread_cancel (th=1719432289) at pthread_cancel.c:35
#1  0x74579a94 in rt_task_delete () from /usr/lib/libnative.so.3
#2  0x7766e270 in RTT::os::rtos_task_delete_main
(main_task=0x82ab90) at
/home/kaltan/src/git/orocos-rtt/src/os/xenomai/fosi_internal.cpp:165
#3  0x77669d1d in ~MainThread (this=0x82ab80, __in_chrg=value
optimized out) at
/home/kaltan/src/git/orocos-rtt/src/os/MainThread.cpp:55


Maybe related, I also get 'bogus' segfaults in relaxed native task
threads when using the CORBA TAO library. Sometimes a 'throw'
statement causes a segv, sometimes something else causes it.
Identically same code running in plain gnulinux is not a problem, the
code is clearly correct. Also not a problem in 2.4.10, but I can't
currently (when writing this email) reproduce it. I keep you posted
for this.

I had these problems first in virtual machines (Sun VBox), but I could
reproduce them on the host too. Please tell me so if I should refrain
from testing in a VM guest.

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] select: native tasks with posix skin mqueues

2009-11-30 Thread Peter Soetens
On Thu, Nov 5, 2009 at 02:46, Gilles Chanteperdrix
gilles.chanteperd...@xenomai.org wrote:

 Peter Soetens wrote:
  Hi,
 
  I'm creating my RT threads using the native API and I'm creating
  mqueues, wrapped to the pthread_rt library.
  I can read and write the mqueue (and it goes through Xenomai), but
  when I select() on a receiving mqd_t, the select() calls returns that
  there is data available on the mq (it fills in the FD_SET), but keeps
  doing so even when it's empty (the select() is in a loop). Also, it's
  modeswitching like nuts.
 
  I found out that the __wrap_select is correctly called, but returns
  -EPERM. Kernel sources indicate that this is caused by
  pse51_current_thread() alias thread2pthread() returning null. Since
  EPERM is returned to userspace, the __real_select is called from user
  space, causing the mode switches and bad behaviour. This is almost
  certainly the thing that native + RTDM + select() is seeing too.
 
  My mqueues-only work probably because mq.c only uses
  pse51_current_thread() in the mq_notify function. I'm guessing that
  mq_notify would also not work in combination with native skin.
 
  I had two options in fixing this: add a xnselector to the native task
  struct or to the nucleus xnthread_t. I choose the latter, such that
  every skin kan use select() + RTDM and migrate gradualy to the RTDM
  and/or Posix skin.
  I needed to free the xnselector structure in xnpod_delete_thread() , I
  chose a spot, but it causes a segfault in my native thread (which did
  the select) during program cleanup. Any advice ? Also, maybe we should
  separate select() from the posix skin and put it in a separate place
  (in RTDM as rtdm_select() ?), such that we can start building around
  it (posix just forwards to rtdm_select() then).
 
  A second patch was necessary to return the timeout case properly to
  userspace (independent of first patch).
 
  Tested with native + posix loaded and mq. If you never quit your
  application, this works :-)

 Hi,

 I have included a lightly modified version of this patch on head, I do
 not see any crash.  However, I have some doubts about the current
 implementation: calling xnselector_destroy() opens opportunities for a
 rescheduling, which I am not sure is really what we want in the middle
 of xnpod_delete_thread(). Philippe, what do you think?

I'll test this patch this week too. It seems you forgot to apply the
second patch, which can go straight into the 2.4 head, since it fixes
a bug in the select() wrapping code when timeouts are used. See
parent's 0002-posix-Fix-__wrap_select-when-timeout-happens.patch

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-help] select: native tasks with posix skin mqueues

2009-10-02 Thread Peter Soetens
On Thu, Oct 1, 2009 at 17:34, Gilles Chanteperdrix
gilles.chanteperd...@xenomai.org wrote:
 Peter Soetens wrote:
 On Thu, Oct 1, 2009 at 16:47, Gilles Chanteperdrix
 gilles.chanteperd...@xenomai.org wrote:
 Peter Soetens wrote:
 Hi,

 I'm creating my RT threads using the native API and I'm creating
 mqueues, wrapped to the pthread_rt library.
 I can read and write the mqueue (and it goes through Xenomai), but
 when I select() on a receiving mqd_t, the select() calls returns that
 there is data available on the mq (it fills in the FD_SET), but keeps
 doing so even when it's empty (the select() is in a loop). Also, it's
 modeswitching like nuts.

 I found out that the __wrap_select is correctly called, but returns
 -EPERM. Kernel sources indicate that this is caused by
 pse51_current_thread() alias thread2pthread() returning null. Since
 EPERM is returned to userspace, the __real_select is called from user
 space, causing the mode switches and bad behaviour. This is almost
 certainly the thing that native + RTDM + select() is seeing too.

 My mqueues-only work probably because mq.c only uses
 pse51_current_thread() in the mq_notify function. I'm guessing that
 mq_notify would also not work in combination with native skin.

 I had two options in fixing this: add a xnselector to the native task
 struct or to the nucleus xnthread_t. I choose the latter, such that
 every skin kan use select() + RTDM and migrate gradualy to the RTDM
 and/or Posix skin.
 I needed to free the xnselector structure in xnpod_delete_thread() , I
 chose a spot, but it causes a segfault in my native thread (which did
 the select) during program cleanup. Any advice ? Also, maybe we should
 separate select() from the posix skin and put it in a separate place
 (in RTDM as rtdm_select() ?), such that we can start building around
 it (posix just forwards to rtdm_select() then).

 A second patch was necessary to return the timeout case properly to
 userspace (independent of first patch).

 Tested with native + posix loaded and mq. If you never quit your
 application, this works :-)

 (maybe we discuss this better further on xenomai-core)
 Ok. Got it now. My idea with that the nucleus service xnselect could be
 used to implement select-like services which would have different
 semantics depending on the skins.

 So, the select service with posix semantics was reserved to posix skin
 threads.

 Yes. The segfaults I'm seeing is not related to the cleanup of my
 xnselector struct in xnpod_delete_thread, because when removing the
 cleanup code, still leads to the segfault. Probably Posix does
 something special to let the thread leave select() earlier.

 To know whether the bug comes from your code or from an unseen bug in
 xnselect implementation (there is a suspicious access to the xnselector
 structure when waking up), could you try the same test with the original
 support, simply using posix skin threads.

Will do next week.


 Other than that, the support is really tied to the posix skin: this
 version of select will only accept file descriptors which were returned
 by the posix skin or the rtdm skin. So, I am afraid making it a generic
 service is a bit hard.

I don't have a clear view like you have, but clearly, being able to
use select() on the rtdm user API while using any other skin than
Posix looks like a big plus to me. I thought this was also in the line
of what Philippe was thinking of, ie to center our file descriptors
around the rtdm. Moving select() to rtdm, and having the Posix skin as
the first 'compatible with rtdm' skin, changes the perspective for the
better imho.

I'm not looking for making select() generic for all skins, but
adapting the existing skins to opt-in for using 'rtdm_select()', which
is something we can merge in gradually. Do you think this is
feasible/desired ?

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] select: native tasks with posix skin mqueues

2009-10-01 Thread Peter Soetens
Hi,

I'm creating my RT threads using the native API and I'm creating
mqueues, wrapped to the pthread_rt library.
I can read and write the mqueue (and it goes through Xenomai), but
when I select() on a receiving mqd_t, the select() calls returns that
there is data available on the mq (it fills in the FD_SET), but keeps
doing so even when it's empty (the select() is in a loop). Also, it's
modeswitching like nuts.

I found out that the __wrap_select is correctly called, but returns
-EPERM. Kernel sources indicate that this is caused by
pse51_current_thread() alias thread2pthread() returning null. Since
EPERM is returned to userspace, the __real_select is called from user
space, causing the mode switches and bad behaviour. This is almost
certainly the thing that native + RTDM + select() is seeing too.

My mqueues-only work probably because mq.c only uses
pse51_current_thread() in the mq_notify function. I'm guessing that
mq_notify would also not work in combination with native skin.

I had two options in fixing this: add a xnselector to the native task
struct or to the nucleus xnthread_t. I choose the latter, such that
every skin kan use select() + RTDM and migrate gradualy to the RTDM
and/or Posix skin.
I needed to free the xnselector structure in xnpod_delete_thread() , I
chose a spot, but it causes a segfault in my native thread (which did
the select) during program cleanup. Any advice ? Also, maybe we should
separate select() from the posix skin and put it in a separate place
(in RTDM as rtdm_select() ?), such that we can start building around
it (posix just forwards to rtdm_select() then).

A second patch was necessary to return the timeout case properly to
userspace (independent of first patch).

Tested with native + posix loaded and mq. If you never quit your
application, this works :-)

(maybe we discuss this better further on xenomai-core)

Thanks for the wonderful XUM-2009 experience btw !

Peter
From 0380298181f2926e6abe05e6b3d0b02389892a7c Mon Sep 17 00:00:00 2001
From: Peter Soetens pe...@thesourceworks.com
Date: Thu, 1 Oct 2009 15:57:54 +0200
Subject: [PATCH] Move posix selector in nucleus for every skin to use.
 This patch makes the select implementation in syscall.c independent of the posix skin.

---
 include/nucleus/thread.h   |3 +++
 ksrc/nucleus/pod.c |6 ++
 ksrc/nucleus/thread.c  |2 ++
 ksrc/skins/posix/syscall.c |6 +++---
 ksrc/skins/posix/thread.c  |7 ---
 ksrc/skins/posix/thread.h  |4 
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/nucleus/thread.h b/include/nucleus/thread.h
index 3eefb53..88a14cc 100644
--- a/include/nucleus/thread.h
+++ b/include/nucleus/thread.h
@@ -142,6 +142,7 @@ struct xnthread;
 struct xnsched;
 struct xnsynch;
 struct xnrpi;
+struct xnselector;
 
 typedef struct xnthrops {
 
@@ -208,6 +209,8 @@ typedef struct xnthread {
 	xnstat_exectime_t lastperiod; /* Interval marker for execution time reports */
 } stat;
 
+struct xnselector *selector;/* For select. */
+
 int errcode;		/* Local errno */
 
 xnasr_t asr;		/* Asynchronous service routine */
diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 9348ce1..2dd08ca 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -41,6 +41,7 @@
 #include nucleus/registry.h
 #include nucleus/module.h
 #include nucleus/stat.h
+#include nucleus/select.h
 #include asm/xenomai/bits/pod.h
 
 /* debug support */
@@ -1204,6 +1205,11 @@ void xnpod_delete_thread(xnthread_t *thread)
 	xntimer_destroy(thread-rtimer);
 	xntimer_destroy(thread-ptimer);
 
+	if (thread-selector) {
+		xnselector_destroy(thread-selector);
+		thread-selector = NULL;
+	}
+
 	if (xnthread_test_state(thread, XNPEND))
 		xnsynch_forget_sleeper(thread);
 
diff --git a/ksrc/nucleus/thread.c b/ksrc/nucleus/thread.c
index 5fceaec..ed9722e 100644
--- a/ksrc/nucleus/thread.c
+++ b/ksrc/nucleus/thread.c
@@ -95,6 +95,8 @@ int xnthread_init(xnthread_t *thread,
 	thread-asr = XNTHREAD_INVALID_ASR;
 	thread-asrlevel = 0;
 
+	thread-selector = NULL;
+
 	thread-iprio = prio;
 	thread-bprio = prio;
 	thread-cprio = prio;
diff --git a/ksrc/skins/posix/syscall.c b/ksrc/skins/posix/syscall.c
index d936f21..f4afd29 100644
--- a/ksrc/skins/posix/syscall.c
+++ b/ksrc/skins/posix/syscall.c
@@ -2382,12 +2382,12 @@ static int __select(struct task_struct *curr, struct pt_regs *regs)
 	xntmode_t mode = XN_RELATIVE;
 	struct xnselector *selector;
 	struct timeval tv;
-	pthread_t thread;
+	xnthread_t* thread;
 	int i, err, nfds;
 	size_t fds_size;
 
-	thread = pse51_current_thread();
-	if (!thread)
+	thread = xnpod_current_thread();
+	if ( !thread )
 		return -EPERM;
 
 	if (__xn_reg_arg5(regs)) {
diff --git a/ksrc/skins/posix/thread.c b/ksrc/skins/posix/thread.c
index ad4aaa5..6e89625 100644
--- a/ksrc/skins/posix/thread.c
+++ b/ksrc/skins/posix/thread.c
@@ -78,12 +78,6 @@ static void thread_delete_hook(xnthread_t *xnthread)
 	pse51_mark_deleted(thread);
 	pse51_signal_cleanup_thread(thread

Re: [Xenomai-core] RFC: 2.5 todo list.

2009-09-30 Thread Peter Soetens
On Tue, Sep 29, 2009 at 19:31, Gilles Chanteperdrix
gilles.chanteperd...@xenomai.org wrote:

 Hi guys,

 full of energy after this tremendous first XUM, I would like to start a
 discussion about what people would like to see in the 2.5 branch.

So if we answer positively, we'll delay the release ? I'd rather get
2.5 out, and develop any new stuff on 2.6. I would also expect that
this list (or part of ) goes to xenomai-help too.


 Here is a first list, please feel free to criticize it:
 - signals in primary domain (something that we almost forgot)

I refrain from using signals in my apps. They only cause disaster when
using 3rd party libraries. Only Ctrl-C (quit) and debugger signals are
used, and a switch from primary to secondary is perfectly acceptable
in these two cases.

 - xnsynch_acquire using atomic_cmpxchg unconditionally (no #ifdefs)

This is too core for me.

 - statistics of all mapped named heaps in /proc/xenomai/heap

Don't use heaps since we do all in user space (and I had the
impression that heap was for kernel-user.)

 - unwapped access to user-space posix skin methods

I wouldn't know why I need this. Do you we link with libpthread
instead of libpthread-rt ?

 - fast semaphores in user-space

I donn't know why I wouldn't need this.

 - syscall-less select ?

Since a syscall is not per-se bad (?) I also don't see what to win here.


 Actually, there are already a lot of things.
 So, what do you think?

I'm uttermost concerned with stability and to a lesser extent
performance. I regard every feature change as changing those two
criteria for the worse (unless its a feature that fixes a bug).

The kernel and libc are already a moving targets which influence
Xenomai, so we already have to cope with changes more than we want to.

Peter (non-authorative, non-developer)

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] RFC: 2.5 todo list.

2009-09-30 Thread Peter Soetens
On Wed, Sep 30, 2009 at 16:27, Gilles Chanteperdrix
gilles.chanteperd...@xenomai.org wrote:
 Peter Soetens wrote:
 On Tue, Sep 29, 2009 at 19:31, Gilles Chanteperdrix
 gilles.chanteperd...@xenomai.org wrote:
 Hi guys,

 full of energy after this tremendous first XUM, I would like to start a
 discussion about what people would like to see in the 2.5 branch.

 So if we answer positively, we'll delay the release ? I'd rather get
 2.5 out, and develop any new stuff on 2.6. I would also expect that
 this list (or part of ) goes to xenomai-help too.

 The facts are:
 - our release cycle is long;
 - we want to keep the ABI stable for each branch.
 So, anything that we want soon and that breaks the ABI should be done
 in the 2.5 branch, otherwise will have to wait 2.6.

Ok, but there is stuff in 2.5 I want soon too, which you would be delaying.



 Here is a first list, please feel free to criticize it:
 - signals in primary domain (something that we almost forgot)

 I refrain from using signals in my apps. They only cause disaster when
 using 3rd party libraries. Only Ctrl-C (quit) and debugger signals are
 used, and a switch from primary to secondary is perfectly acceptable
 in these two cases.

 Yes, signals is a bit of a misnomer, what we actually want is for the
 kernel to be able to cause the execution of an asynchronous callback in
 user-space. For the native skin, it would be for the implementation of
 some hooks. For the posix skin, it would be for the implementation of
 signasl. The implementation of posix timers is based on signals (except
 for SIGEV_THREAD, but who uses SIGEV_THREAD in an rt app...), having
 them cause a switch to secondary mode make them unusable for practical
 purposes. So, with the current version of Xenomai posix skin, you have
 to implement your own timer method, having for instance a thread which
 nanosleep()s until the next timer expiry and then executes the callback.

Ok, but I don't use posix timers for the reasons above. I use
clock_nanosleep instead, which offers the same functionality.



 - xnsynch_acquire using atomic_cmpxchg unconditionally (no #ifdefs)

 This is too core for me.

 - statistics of all mapped named heaps in /proc/xenomai/heap

 Don't use heaps since we do all in user space (and I had the
 impression that heap was for kernel-user.)

 - unwapped access to user-space posix skin methods

 I wouldn't know why I need this. Do you we link with libpthread
 instead of libpthread-rt ?

 Well the wrap thing is a bit cumbersome. And having the calls be named
 with exactly the posix name is useful only if you intend to compile
 exactly the same code for xenomai and other posix systems. Otherwise,
 you could decide to use a little prefix or suffix to each posix skin
 service, and avoid the wrapping clumsyness.

So like we did in the RTAI days. Maybe we can use rt_ by (safe!)
default and allow a #define in case the users wants to use the
wrapping and is aware that he needs to use the wrapping during
linking.



 - fast semaphores in user-space

 I donn't know why I wouldn't need this.

 - syscall-less select ?

 Since a syscall is not per-se bad (?) I also don't see what to win here.

 syscall are expensive (which is why we do syscall-less mutexes for
 instance). The idea would be to put the bitfield with the ready file
 descriptors in a shared heap, to avoid going for the syscall if fds are
 already ready when entering select(). The scenario where we would gain
 is on a loaded system, which is exactly when we want to avoid useless
 syscalls.

Then I'm tempted to be in favour, although I'd like to confirm first
that select() is not broken as it is now. Are syscalls expensive
because I'm running Xenomai, or is this the case in vanilla Linux too
? Do we try to be better than Linux (until they use a similar 'fix' in
libc) ?



 Actually, there are already a lot of things.
 So, what do you think?

 I'm uttermost concerned with stability and to a lesser extent
 performance. I regard every feature change as changing those two
 criteria for the worse (unless its a feature that fixes a bug).

 Well... I disagree. Even when fixing bugs we can introduce other bugs.
 What matters if you aim for stability and performance is improving the
 tests, not avoiding modifications.

You got me. But until the tests are improved, I beg you to be careful ;-)

Peter

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Comedi drivers in Xenomai porting/integration status ?

2009-02-17 Thread Peter Soetens
On Tuesday 17 February 2009 00:00:00 Alexis Berlemont wrote:
 Hi,

  Hello all! I would like to know what is the current status of the Comedi
  port to Xenomai.
 
  Should all the specific Comedi drivers (ni_pcimio, ni_mite) be available
  for testing (by me or someone with a supported DAQ card) and (if ok) for
  futher integration ?

 I am still working on that port. It is a long work and I am wondering at
 each line whether I should rewrite any part of code which does not comply
 with common coding constraints. Unfortunately, I currently do not have a
 lot of spare time. Anyway, most of the ni subdevices drivers have been
 ported (mite, tio, mio, 8255). I am trying to finalize the global driver
 port.

 By the way, in the middle of january, I noticed that the legacy Comedi
 branch found its way into the mainline (through the staging tree). I do not
 know what will be the future of such a package in mainstream. I assume the
 main goal is the definition of a global framework for acquisition boards
 like V4L2 is for video cards.

I'm not sure I understand where this is going. We did a review of the 
Xenomai/Comedi code integration a few weeks ago. 

These are the facts we observed:

* The Xenomai/Comedi port breaks the complete Comedi API, user space *and* 
kernel space. (We thought/assumed that only the user space interface would go 
over RTDM and that once that was done, the kernel modules could be almost 
copy/pasted into the new framework.)
* The Xenomai/Comedi port is not supported by 'upstream' (what you call 
'legacy'). It's not discussed on their ML, they don't send in patches or 
feedback. 
* There aren't any (?) device drivers ported to the Xenomai/Comedi project 
(public trunk)

This is what we concluded:

* Xenomai/Comedi has no future as long as it ignores (or is ignored by) 
upstream. Even after a port of a device driver, pulling fixes from upstream 
will be hard due to the changed kernel API.
* As GKH puts it: all device drivers belong in the Linux kernel. Upstream is 
doing this right now, which makes acceptance of Xenomai/Comedi unlikely, which 
makes its life expectations uncertain.
* We're now actually considering Preempt/RT as the kernel to use in 
combination with the original Comedi. We might be stupid, but then again, it 
might just work.
* We believe the name Xenomai/Comedi is strongly misleading.  It suggests a 
painless transition path, but it's a completely different software project, 
different interfaces, different maintainer(s ?).


Sorry for flaming, and please correct me where I'm wrong.

Peter
-- 
Peter Soetens -- FMTC -- http://www.fmtc.be

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Comedi drivers in Xenomai porting/integration status ?

2009-02-17 Thread Peter Soetens
On Tuesday 17 February 2009 10:41:10 Jan Kiszka wrote:
 Peter Soetens wrote:
  These are the facts we observed:
 
  * The Xenomai/Comedi port breaks the complete Comedi API, user space
  *and* kernel space. (We thought/assumed that only the user space
  interface would go over RTDM and that once that was done, the kernel
  modules could be almost copy/pasted into the new framework.)

 Maybe you have a list of the major differences. Then please share it so
 that the motivation can be discussed here and maybe clarified (it's a
 blurred topic for me as well).

Damn. I should have posted back then :-) Our main lead was the Doxygen pages, 
from which we went on to see how things were done in code. Unfortunately (?), 
I'm not a Comedi developer, I twiddled only with one Comedi driver. Once. I 
can't really compare to the bone. But what was clear immediately was that both 
user API and kernel API were different. 

User ('Library') API was cleaned up and streamlined, we could live with that 
for new applications. I'm sure there are still issues, but they'll only come 
up once people start using this branch. I like the separation between a low 
level 'instruction' api and a high level 'function' api. Something upstream 
comedi mixes to much.

For kernel ('Driver') API, a new data transfer mechanism is in place, which 
requires the 'porting' of all drivers. I genuinely can't estimate how 
drastically this changes existing drivers, but the API is quite huge and works 
with the 'inversion of control' paradigm: Each driver must implement a series 
of hooks, and the Comedi/RTDM framework will call these when necessary.

Another fact I shouldn't have omitted:

* The Xenomai/Comedi layer is very well documented and allows anyone to learn 
from it, even the upstream maintainers.
* Seen from my little device driver knowledge, the technical implementation 
looks ok for synchronous/asynchronous reading and writing. Memory mapped IO is 
not available, it seems, and the classical comedi_config, inp,outp,... family 
isn't complete yet either.


  * The Xenomai/Comedi port is not supported by 'upstream' (what you call
  'legacy'). It's not discussed on their ML, they don't send in patches or
  feedback.
  * There aren't any (?) device drivers ported to the Xenomai/Comedi
  project (public trunk)
 
  This is what we concluded:
 
  * Xenomai/Comedi has no future as long as it ignores (or is ignored by)
  upstream. Even after a port of a device driver, pulling fixes from
  upstream will be hard due to the changed kernel API.

 IMHO, that heavily depends on the use cases both projects are able to
 cover. If there are major RT design issues upstream that this variant
 solves, then people may be willing to live with the differences
 (including a smaller driver set downstream). It wouldn't be the first time.

I see the short term profits as well. I'm fearing a maintenance nightmare in 
the long term.


  * As GKH puts it: all device drivers belong in the Linux kernel. Upstream
  is doing this right now, which makes acceptance of Xenomai/Comedi
  unlikely, which makes its life expectations uncertain.

 I don't think anyone expect to see the RT drivers here and around in
 mainline Linux in the foreseeable future. That's not the primary goal.
 At the same time, it's still unclear how serious mainline is about RT
 redesigns of existing drivers or frameworks. And I recall from earlier
 threads on the comedi list that at least the current comedi maintainers
 consider RT use at best as a niche and not an important scenario.

That's what I recall as well. But I was wondering how the RT issue was 
overtaken by reality: running plain comedi in a preemptible kernel.


  * We're now actually considering Preempt/RT as the kernel to use in
  combination with the original Comedi. We might be stupid, but then again,
  it might just work.
  * We believe the name Xenomai/Comedi is strongly misleading.  It suggests
  a painless transition path, but it's a completely different software
  project, different interfaces, different maintainer(s ?).

 I agree. If reasons for significant differences in the user API remain,
 then a different name would be appropriate, too. Looks like this is not
 Socket-CAN vs. RT-Socket-CAN here?

Most functions and structs have been renamed and have modified arguments, 
although there could be a 1:1 mapping (1 old function - 1 new function).

I believe Alexis can defend his design better than anyone else, and it's not 
the design I wanted to tackle. It's how he plans to maintain it.

Peter

-- 
Peter Soetens -- FMTC -- http://www.fmtc.be

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [FIX] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

2007-09-17 Thread Peter Soetens
On Saturday 15 September 2007 20:52:59 Philippe Gerum wrote:
 On Fri, 2007-09-07 at 11:27 +0200, Peter Soetens wrote:
  Just in case you hooked off the long discussion about the issues we found
  from Xenomai 2.3.2 on:
 
o We are using the xeno_native skin, create Xeno tasks and semaphores,
  but have strong indications that the crashes are caused by the memory
  allocation scheme of Xenomai in combination with task creation/deletion
o We found two ways to break Xenomai, causing a 'Killed'
  (rt_task_delete) and causing an OOPS (rt_task_join).
o They happen on 2.6.20 and 2.6.22 kernels
o On the 2.3 branch, r2429 works, r2433 causes the faults. The patch is
  small, and in the ChangLog:

 Please try this patch against v2.3.x. A double free issue on a task TCB
 already scheduled for memory release was causing all sorts of troubles,
 basically trashing the system heap afterwards:

Thanks, we'll try, test and report ASAP (= somewhere this week.)

Peter
-- 
Peter Soetens -- FMTC -- http://www.fmtc.be

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Summary: Xenomai 2.3.2 and 2.4 lock-ups and OOPSes

2007-09-07 Thread Peter Soetens
Just in case you hooked off the long discussion about the issues we found from
Xenomai 2.3.2 on:

  o We are using the xeno_native skin, create Xeno tasks and semaphores, but 
have strong indications that the crashes are caused by the memory allocation 
scheme of Xenomai in combination with task creation/deletion
  o We found two ways to break Xenomai, causing a 'Killed' (rt_task_delete) 
and causing an OOPS (rt_task_join).
  o They happen on 2.6.20 and 2.6.22 kernels
  o On the 2.3 branch, r2429 works, r2433 causes the faults. The patch is 
small, and in the ChangLog: 

2007-05-11  Philippe Gerum  [EMAIL PROTECTED]

* include/nucleus/heap.h (xnfreesafe): Use xnpod_current_p() when
checking for deferral.

* include/nucleus/pod.h (xnpod_current_p): Give exec mode
awareness to this predicate, checking for primary/secondary mode
of shadows.

2007-05-11  Gilles Chanteperdrix  [EMAIL PROTECTED]

* ksrc/skins: Always defer thread memory release in deletion hook
by calling xnheap_schedule_free() instead of xnfreesafe().

  o We reverted this patch on HEAD of the 2.3 branch, but got -ENOMEM errors 
during Xenomai resource allocations, indicating that later changes depend on 
this patch. So we use clean HEAD again further on to find the causes:
 o A first test (in Orocos) creates one thread, two semaphores, lets it wait 
on them and cleans up the thread.
 o During rt_task_delete, our program gets 'Killed' (without joinable thread), 
hence a user space problem. However, gdb is of no use, all thread info is 
lost.
 o We made the thread joinable (T_JOINABLE), and then joined. This bypassed 
the Kill on the first run but causes an OOPS the second time the same 
application is started:

Oops:  [#1]
PREEMPT
CPU:0
EIP:0060:[fef4a1f3]Not tainted VLI
EFLAGS: 00010002   (2.6.20.9-ipipe-1.8-08 #2)
EIP is at get_free_range+0x56/0x160 [xeno_nucleus]
eax: f3a81d01   ebx: 0200   ecx: 0101   edx: fef62b00
esi: 0101   edi: 0200   ebp: f0f33ec4   esp: f0f33e98
ds: 007b   es: 007b   ss: 0068
Process NonPeriodicActi (pid: 3020, ti=f0f32000 task=f7ce61b0 
task.ti=f0f32000)
Stack:  0600 fef62b80 f3a81b24 f3a8 fef62ba4 f3a80720 0101
   0600 f0f33f18 f7ce6360 f0f33ee4 fef4a948 fef62b80 f0f33f08 
   0400 f0f33f18 f7ce6360 f0f33f50 ff13e1de 0282 0282 bfab6350
Call Trace:
 [c0103ffb] show_trace_log_lvl+0x1f/0x35
 [c01040bb] show_stack_log_lvl+0xaa/0xcf
 [c01042a9] show_registers+0x1c9/0x392
 [c0104588] die+0x116/0x245
 [c0110fca] do_page_fault+0x287/0x61d
 [c010ea35] __ipipe_handle_exception+0x63/0x136
 [c029466d] error_code+0x79/0x88
 [fef4a948] xnheap_alloc+0x15b/0x17d [xeno_nucleus]
 [ff13e1de] __rt_task_create+0xe0/0x171 [xeno_native]
 [fef5655f] losyscall_event+0xaf/0x170 [xeno_nucleus]
 [c0138804] __ipipe_dispatch_event+0xc0/0x1da
 [c010e90b] __ipipe_syscall_root+0x43/0x10a
 [c0102e79] system_call+0x29/0x41
 ===
Code: 74 61 85 c0 74 5d c7 45 e0 00 00 00 00 8b 4d e4 8b 49 10 89 4d ec 85 c9 
74 38 8b 45 dc 8b 78 0c 89 4d f0 89 ce 89 fb eb 02 89 ce 8b 09 8d 04 3e 39 
c1 0f 94 c2 3b 5d d8 0f 92 c0 01 fb 84 c2 75
EIP: [fef4a1f3] get_free_range+0x56/0x160 [xeno_nucleus] SS:ESP 
0068:f0f33e98
[hard lockup]

  o Our application is also mixing the original RT_TASK struct and return 
value of the rt_task_self() function call when calling rt_ functions. 
Switching between one of those influences the crashing behaviour as well, not 
further investigated.

  o This was reproduced on two different systems (one with SMI workaround 
working)
 
You have the patch that broke things, I hope this gives you a hint on what 
causes our crashes. Know that Orocos as-is has worked with Xenomai from  
Xenomai 2.0 on.

Peter

-- 
Peter Soetens -- FMTC -- http://www.fmtc.be

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core