date:20060508

Re: [Xenomai-core] [RFC][patch] per-process data.

2006-05-08 Thread Dmitry Adamushko


On 08/05/06, Gilles Chanteperdrix [EMAIL PROTECTED] wrote:

Jan Kiszka wrote:
  Likely I did not yet get the full picture: What prevents using another
  adeos per-task key for this?

We would need a per-task key for every skin that needs a per-process
data, but more importantly, we would need to track the clone syscalls
(that would be another adeos event) in order to propagate the per-task
data to each thread of the process. Or maybe ptds are inherited upon
clone ?


All of them are cleared in kernel/fork.c :: copy_process(). At least,
the one pointed by nkgkptd (keeps shadow's xnthread_t *) must be
cleared. Another logic could be applied for the rest, I guess.

p.s. I haven't read you patch yet so all said above is out of current context :)

--
Best regards,
Dmitry Adamushko

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC] Micro-optimisations for the libs

2006-05-08 Thread Jan Kiszka

Wolfgang Grandegger wrote:
 Philippe Gerum wrote:
 Jan Kiszka wrote:
 Hi,

 [Daniel, I put you in the CC as you showed some interest in this topic.]

 as I indicated a some weeks ago, I had a closer look at the code the
 user space libs currently produce (on x86). The following considerations
 are certainly not worth noticeable microseconds on GHz boxes, but they
 may buy us (yet another) few micros on low-end.

 First of all, there is some redundant code in the syscall path of each
 skin service. This is due to the fact that the function code is
 calculated based on the the skin mux id each time a service is invoked.
 The mux id has to be shifted and masked in order to combine it with the
 constant function code part - this could also easily happen
 ahead-of-time, saving code and cycles for each service entry point.

 Here is a commented disassembly of some simple native skin service which
 only takes one argument.


 Function prologue:
  460:   55  push   %ebp
  461:   89 e5   mov%esp,%ebp
  463:   57  push   %edi
  464:   83 ec 10sub$0x10,%esp

 Loading the skin mux-id:
  467:   a1 00 00 00 00  mov0x0,%eax

 Loading the argument (here: some pointer)
  46c:   8b 7d 08mov0x8(%ebp),%edi

 Calculating the function code:
  46f:   c1 e0 10shl$0x10,%eax
  472:   25 00 00 ff 00  and$0xff,%eax
  477:   0d 2b 02 00 08  or $0x800022b,%eax

 Saving the code:
  47c:   89 45 f8mov%eax,0xfff8(%ebp)

  47f:   53  push   %ebx

 Loading the arguments (here only one):
  480:   89 fb   mov%edi,%ebx

 Restoring the code again, issuing the syscall:
  482:   8b 45 f8mov0xfff8(%ebp),%eax
  485:   cd 80   int$0x80

  487:   5b  pop%ebx

 Function epilogue:
  488:   83 c4 10add$0x10,%esp
  48b:   5f  pop%edi
  48c:   5d  pop%ebp
  48d:   c3  ret


 Looking at this code, I also started thinking about inlining short and
 probably heavily-used functions into the user code. This would save the
 function prologue/epilogue both in the lib and the user code itself. For
 sure, it only makes sense for time-critical functions (think of
 mutex_lock/unlock or rt_timer_read). But inlining could be made optional

 The best optimization for rt_timer_read() would be to do the
 cycles-to-ns conversion in user-space from a direct TSC reading if the
 arch supports it (most do). Of course, this would only be possible for
 strictly aperiodic timing setups (i.e. CONFIG_XENO_OPT_PERIODIC_TIMING
 off).

 For the rt_mutex_lock()/unlock(), we still need to refrain from
 calling the kernel for uncontended access by using some Xeno
 equivalent of the futex approach, which would suppress most of the
 incentive to micro-optimize the call itself.

Ack. That's a bit more complex to realise but should be put on our to-do
list as well.


 for the user by providing both the library variant and the inlined
 version. The users could then select the preferred one by #defining some
 control switch before including the skin headers.

 Any thoughts on this? And, almost more important, anyone around willing
 to work on these optimisations and evaluate the results? I can't ATM.


 Quite frankly, I remember that I once had to clean up the LXRT
 inlining support in RTAI 3.0/3.1, and this was far from being fun
 stuff to do. Basically, AFAICT, having both inline and out-of-line
 support for library calls almost invariably ends up to a maintenance
 nightmare of some sort, e.g. depending whether to compile with gcc's
 optimization on or not, which might be dictated by the fact that one
 also wants (exploitable) debug information or not, and so on. Not to
 speak of the fact that you end up having two implementations to
 maintain separately.

 This said, only the figures would tell us if such inlining brings
 something significant or not to the picture performance-wise on
 low-end hw, so I'd be interested to see those first.
 
 I agree! I have also doubts that using inline function will improve
 latencies. Larger code also results in more TBL misses and cache refills.
 

I'm definitely not arguing for the aggressive inlining RTAI provides
(all or nothing). I'm rather suggesting selective optional inlining of
trivial and heavily-used stubs. Think of something like rt_sem_v() for
the library call vs. rt_sem_v_inlined() for an unrolled variant.
rt_sem_v() could simply be implemented by calling rt_sem_v_inlined(),
thus there should also be no reason why the maintenance hell might get
reopened.

I think the order I presented should be kept when looking at these
optimisations:
1. reduce the complexity of existing syscall code,
2. consider if we can provide any benefit to the user by offering /some/
functions

Re: [Xenomai-core] [RFC][patch] per-process data.

2006-05-08 Thread Gilles Chanteperdrix

Dmitry Adamushko wrote:
  On 08/05/06, Gilles Chanteperdrix [EMAIL PROTECTED] wrote:
   Jan Kiszka wrote:
 Likely I did not yet get the full picture: What prevents using another
 adeos per-task key for this?
  
   We would need a per-task key for every skin that needs a per-process
   data, but more importantly, we would need to track the clone syscalls
   (that would be another adeos event) in order to propagate the per-task
   data to each thread of the process. Or maybe ptds are inherited upon
   clone ?
  
  All of them are cleared in kernel/fork.c :: copy_process(). At least,
  the one pointed by nkgkptd (keeps shadow's xnthread_t *) must be
  cleared. Another logic could be applied for the rest, I guess.

It make sense, if they were not cleared, they would not be per-thread
data.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC][patch] per-process data.

2006-05-08 Thread Gilles Chanteperdrix

Gilles Chanteperdrix wrote:
  These patches are not ready for inclusion, they are not tested
  yet.

The attached versions are tested. I still wonder if handling this in
shadow.c is the right solution, or if there should be an xnppd_set call
that could be called from within the skins event callbacks.

-- 


Gilles Chanteperdrix.
--- ./include/asm-i386/ipipe.h~ 2006-05-07 16:00:37.0 +0200
+++ ./include/asm-i386/ipipe.h  2006-05-07 19:00:34.0 +0200
@@ -135,7 +135,8 @@ do { \
 #define IPIPE_EVENT_SIGWAKE(IPIPE_FIRST_EVENT + 2)
 #define IPIPE_EVENT_SETSCHED   (IPIPE_FIRST_EVENT + 3)
 #define IPIPE_EVENT_EXIT   (IPIPE_FIRST_EVENT + 4)
-#define IPIPE_LAST_EVENT   IPIPE_EVENT_EXIT
+#define IPIPE_EVENT_PEXIT  (IPIPE_FIRST_EVENT + 5)
+#define IPIPE_LAST_EVENT   IPIPE_EVENT_PEXIT
 #define IPIPE_NR_EVENTS(IPIPE_LAST_EVENT + 1)
 
 struct ipipe_domain;
--- ./include/linux/ipipe.h~2006-05-07 16:00:36.0 +0200
+++ ./include/linux/ipipe.h 2006-05-07 19:01:01.0 +0200
@@ -439,6 +439,12 @@ static inline void ipipe_exit_notify(str
__ipipe_dispatch_event(IPIPE_EVENT_EXIT,p);
 }
 
+static inline void ipipe_pexit_notify(struct mm_struct *m)
+{
+   if (__ipipe_event_pipelined_p(IPIPE_EVENT_PEXIT))
+   __ipipe_dispatch_event(IPIPE_EVENT_PEXIT,m);
+}
+
 static inline int ipipe_trap_notify(int ex, struct pt_regs *regs)
 {
return __ipipe_event_pipelined_p(ex) ? __ipipe_dispatch_event(ex,regs) 
: 0;
@@ -633,6 +639,7 @@ void fastcall *ipipe_get_ptd(int key);
 #define ipipe_sigwake_notify(p)do { } while(0)
 #define ipipe_setsched_notify(p)   do { } while(0)
 #define ipipe_exit_notify(p)   do { } while(0)
+#define ipipe_mmput_notify(m)  do { } while(0)
 #define ipipe_init_proc()  do { } while(0)
 #define ipipe_trap_notify(t,r) 0
 
--- ./kernel/fork.c~2006-05-07 16:00:36.0 +0200
+++ ./kernel/fork.c 2006-05-07 19:00:02.0 +0200
@@ -368,6 +368,7 @@ void fastcall __mmdrop(struct mm_struct 
 void mmput(struct mm_struct *mm)
 {
if (atomic_dec_and_test(mm-mm_users)) {
+   ipipe_pexit_notify(mm);
exit_aio(mm);
exit_mmap(mm);
if (!list_empty(mm-mmlist)) {
Index: include/asm-generic/hal.h
===
--- include/asm-generic/hal.h   (revision 1029)
+++ include/asm-generic/hal.h   (working copy)
@@ -236,6 +236,14 @@
 return RTHAL_EVENT_PROPAGATE; \
 }
 
+#define RTHAL_DECLARE_PEXIT_EVENT(hdlr) \
+static int hdlr (unsigned event, struct ipipe_domain *ipd, void *data) \
+{ \
+struct mm_struct *mm = (struct mm_struct *)data; \
+do_##hdlr(mm);   \
+return RTHAL_EVENT_PROPAGATE; \
+}
+
 #ifndef IPIPE_EVENT_SELF
 /* Some early I-pipe versions don't have this. */
 #define IPIPE_EVENT_SELF  0
@@ -255,6 +263,8 @@
 #define IPIPE_WIRED_MASK  0
 #endif /* !IPIPE_WIRED_MASK */
 
+#define rthal_catch_pexit(hdlr) \
+ipipe_catch_event(ipipe_root_domain,IPIPE_EVENT_PEXIT,hdlr)
 #define rthal_catch_taskexit(hdlr) \
 ipipe_catch_event(ipipe_root_domain,IPIPE_EVENT_EXIT,hdlr)
 #define rthal_catch_sigwake(hdlr)  \
--- /dev/null   2006-05-08 00:21:12.004363750 +0200
+++ include/nucleus/ppd.h   2006-05-07 18:42:46.0 +0200
@@ -0,0 +1,22 @@
+#ifndef PPD_H
+#define PPD_H
+
+#include nucleus/queue.h
+
+struct mm_struct;
+
+typedef struct xnppd_key {
+unsigned long skin_magic;
+struct mm_struct *mm;
+} xnppd_key_t;
+
+typedef struct xnppd_holder {
+xnppd_key_t key;
+xnholder_t link;
+#define link2ppd(laddr) \
+(xnppd_holder_t *)((char *)(laddr) - offsetof(xnppd_holder_t, link))
+} xnppd_holder_t;
+
+xnppd_holder_t *xnppd_get(unsigned skin_magic);
+
+#endif /* PPD_H */
Index: include/nucleus/shadow.h
===
--- include/nucleus/shadow.h(revision 1029)
+++ include/nucleus/shadow.h(working copy)
@@ -48,7 +48,7 @@
 unsigned magic;
 int nrcalls;
 atomic_counter_t refcnt;
-int (*eventcb)(int);
+void *(*eventcb)(int, void *);
 xnsysent_t *systab;
 #ifdef CONFIG_PROC_FS
 struct proc_dir_entry *proc;
@@ -89,7 +89,7 @@
unsigned magic,
int nrcalls,
xnsysent_t *systab,
-   int (*eventcb)(int event));
+   void *(*eventcb)(int event, void *data));
 
 int xnshadow_unregister_interface(int muxid);
 
Index: ksrc/nucleus/shadow.c
===
--- ksrc/nucleus/shadow.c   (revision 1029)
+++ ksrc/nucleus/shadow.c   (working copy)
@@ -48,6 +48,8 @@
 #include nucleus/shadow.h
 #include

Re: [Xenomai-core] [RFC][patch] per-process data.

2006-05-08 Thread Philippe Gerum


Gilles Chanteperdrix wrote:

Gilles Chanteperdrix wrote:
  These patches are not ready for inclusion, they are not tested
  yet.

The attached versions are tested. I still wonder if handling this in
shadow.c is the right solution, or if there should be an xnppd_set call
that could be called from within the skins event callbacks.



That would likely be better, since some skin might not want any cleanup 
code to be called, and in the current implementation, every skin needs 
to provide some ppd info when returning from the event callback, even if 
only the CLIENT_ATTACH event is to be monitored. On the other hand, we 
could argue that registering an event callback requires to implement all 
requests, including CLIENT_DETACH, possibly by returning a no-op value, 
so that no ppd registration would be done from bind_to_interface(). 
Actually, I think the latter would be the better option.







--- ./include/asm-i386/ipipe.h~ 2006-05-07 16:00:37.0 +0200
+++ ./include/asm-i386/ipipe.h  2006-05-07 19:00:34.0 +0200
@@ -135,7 +135,8 @@ do { \
 #define IPIPE_EVENT_SIGWAKE(IPIPE_FIRST_EVENT + 2)
 #define IPIPE_EVENT_SETSCHED   (IPIPE_FIRST_EVENT + 3)
 #define IPIPE_EVENT_EXIT   (IPIPE_FIRST_EVENT + 4)
-#define IPIPE_LAST_EVENT   IPIPE_EVENT_EXIT
+#define IPIPE_EVENT_PEXIT  (IPIPE_FIRST_EVENT + 5)
+#define IPIPE_LAST_EVENT   IPIPE_EVENT_PEXIT
 #define IPIPE_NR_EVENTS(IPIPE_LAST_EVENT + 1)
 


I've merged such support for process cleanup event notification in the 
Adeos codebase, it will be available with the next patches to come. I've 
named it IPIPE_EVENT_CLEANUP instead of PEXIT though, since that's what 
it is about, and EXIT/PEXIT might be confusing.


+}

+
+static inline struct xnppd_holder *
+xnppd_lookup(unsigned skin_magic, struct mm_struct *mm, unsigned remove)
+{


Let's separate the lookup and removal ops here, making the latter return 
the removed element. All-in-one stuff may save a few lines but is a pain 
for readability.


 
+static inline void do_pexit_event (struct mm_struct *mm)

+{
+unsigned muxid;
+spl_t s;
+
+for (muxid = 0; muxid  XENOMAI_MUX_NR; muxid++)

+{
+xnlock_get_irqsave(nklock, s);
+if (muxtable[muxid].systab  muxtable[muxid].eventcb)
+{
+struct xnppd_holder *ppd;
+
+ppd = xnppd_lookup(muxtable[muxid].magic, mm, 1);
+
+if (ppd)
+muxtable[muxid].eventcb(XNSHADOW_CLIENT_DETACH, ppd);
+}
+xnlock_put_irqrestore(nklock, s);
+}
+}
+


Maybe a better approach would be to simply hash holders referring to 
busy muxtable entries on mm struct pointers? This way, we would only 
have to scan a list handling one element per attached skin per 
terminating process, which will be most of the time a matter of walking 
through a 1/2 element queue.


i.e. indexing table entries on mm pointers from bind_to_interface() to 
ask for the CLIENT_DETACH event to be fired upon process cleanup:


(key: curr-mm) -
(holder: muxtable[muxid1]) -
(holder: muxtable[muxid2]) -
(holder: muxtable[muxid3])

then, scanning the list hashed on the terminating process's mm struct, 
in order to fire the appropriate event callbacks.


Btw, on 2.4 at least, the global Linux mm_lock is held before firing the 
process cleanup notification at Adeos level, so one has to be careful 
about what's actually done in those per-skin cleanup handlers.


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [patch] prepare-kernel shouldnt alter files thru links

2006-05-08 Thread Philippe Gerum


Jim Cromie wrote:


attached patch corrects a mistake in rev 985, which chmod'd a read-only 
file,

even if it was a symlink from a kernel-tree cloned with lndir.
This resulted in a bad original tree for use in building vanilla kernels.

with patch, script renames the symlink, copies it to the expected name,
and *then* chmods it, and appends to it.   Now vanilla kernel builds also
using same tree re-make w/o actually recompiling anything.



Sounds reasonable. Applied, thanks.

--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] kconfig help questions

2006-05-08 Thread Philippe Gerum


Jim Cromie wrote:


Going thru xenomai Kconfig again, some observations/uncertainties came up.
I'll make a patch, given feedback.

XENO_OPT_TIMING_PERIODIC

Aperiodic mode provides for non-constant delays between timer ticks,

the wording here (non-constant delays) left me momentarily wondering
if _APERIODIC was bad (this, despite the use of 'provides'). Maybe 
sub-tick delays makes some sense ..




The fundamental issue is that aperiodic mode is tick-less, in the sense 
that it does not define any constant period value; in this respect, 
introducing the sub-tick notion would not make much sense. What about 
arbitrary delays instead?



XENO_OPT_SHIRQ_LEVEL

Are there any decent estimates or examples of the latency / jitter 
increases

if this is enabled ?  Is it highly cpu or chipset dependent ?


More cache misses since we need to scan the entire chain of handlers, 
and obviously more processing from the various handlers.


I presume /proc/interrupts is the place to look to see if you might need 
it,

iff any of the devices of interest are shared.
my laptop shows sharing, my soekris does not

 3:  5  XT-PIC  ehci_hcd:usb1, ohci1394
 4:  0  XT-PIC  uhci_hcd:usb6
 7: 708975  XT-PIC  ipw2200, ehci_hcd:usb2, uhci_hcd:usb7
 8:  1  XT-PIC  rtc
 9: 437509  XT-PIC  acpi, Intel 82801DB-ICH4, Intel 
82801DB-ICH4 Modem, ohci_hcd:usb3, ohci_hcd:usb4, uhci_hcd:usb5, yenta, 
eth0


# cat /proc/interrupts
  CPU0
 0:   13148086  XT-PIC  timer
 2:  0  XT-PIC  cascade
 4:  33084  XT-PIC  serial
 8:  4  XT-PIC  rtc
10: 337134  XT-PIC  eth0
11: 926639  XT-PIC  ndiswrapper
14: 11  XT-PIC  ide0
NMI:  0



It's more a sharing in the RT sense, so you likely would not have to 
share the IRQs with the non-RT Linux side altogether, since this adds 
more complexity to the picture (specifically: how to share the ack/eoi 
cycle between the RT and non-RT sides).




XENO_OPT_SHIRQ_EDGE

does this only apply to ISA bus ?



Not exclusively. x86-wise, the IO-APIC is both capable of handling edge 
and level interrupts.



XENO_SKIN_NATIVE

will add module name

XENO_OPT_NATIVE_INTR

Note that the preferred
way of implementing generic drivers usable across all Xenomai
interfaces is defined by the Real-Time Driver Model (RTDM).

It doesnt say theyre (in)?compatible, should it ?



They are not. The native interrupt support is providing basic access to 
the core interrupt layer RTDM does use for its own purpose too.




SUMMARY

Id like to see estimates of the latency costs associated with each choice,
where its practical to do so (given the current unknowables, like 
variations across

cpus, chipsets etc).  I recognize that such numbers are the hoped-for
end result of the xenomai-data ML, and perhaps nothing real can be said
yet (or ever) in Kconfig, except in (overly?) broad statements.



IMO, it seems rather difficult to state something sensible regarding the 
latency introduced by IRQ sharing in the Kconfig help strings, since it 
would basically depend on what the shared IRQ handlers are doing in the 
first place.






___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] [RFC][patch] per-process data.

Re: [Xenomai-core] [RFC] Micro-optimisations for the libs

Re: [Xenomai-core] [RFC][patch] per-process data.

Re: [Xenomai-core] [RFC][patch] per-process data.

Re: [Xenomai-core] [RFC][patch] per-process data.

Re: [Xenomai-core] [patch] prepare-kernel shouldnt alter files thru links

Re: [Xenomai-core] kconfig help questions

7 matches

Site Navigation

Mail list logo

Footer information