Re: [Xenomai-core] [RFC][patch] per-process data.
On 08/05/06, Gilles Chanteperdrix [EMAIL PROTECTED] wrote: Jan Kiszka wrote: Likely I did not yet get the full picture: What prevents using another adeos per-task key for this? We would need a per-task key for every skin that needs a per-process data, but more importantly, we would need to track the clone syscalls (that would be another adeos event) in order to propagate the per-task data to each thread of the process. Or maybe ptds are inherited upon clone ? All of them are cleared in kernel/fork.c :: copy_process(). At least, the one pointed by nkgkptd (keeps shadow's xnthread_t *) must be cleared. Another logic could be applied for the rest, I guess. p.s. I haven't read you patch yet so all said above is out of current context :) -- Best regards, Dmitry Adamushko ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC] Micro-optimisations for the libs
Wolfgang Grandegger wrote: Philippe Gerum wrote: Jan Kiszka wrote: Hi, [Daniel, I put you in the CC as you showed some interest in this topic.] as I indicated a some weeks ago, I had a closer look at the code the user space libs currently produce (on x86). The following considerations are certainly not worth noticeable microseconds on GHz boxes, but they may buy us (yet another) few micros on low-end. First of all, there is some redundant code in the syscall path of each skin service. This is due to the fact that the function code is calculated based on the the skin mux id each time a service is invoked. The mux id has to be shifted and masked in order to combine it with the constant function code part - this could also easily happen ahead-of-time, saving code and cycles for each service entry point. Here is a commented disassembly of some simple native skin service which only takes one argument. Function prologue: 460: 55 push %ebp 461: 89 e5 mov%esp,%ebp 463: 57 push %edi 464: 83 ec 10sub$0x10,%esp Loading the skin mux-id: 467: a1 00 00 00 00 mov0x0,%eax Loading the argument (here: some pointer) 46c: 8b 7d 08mov0x8(%ebp),%edi Calculating the function code: 46f: c1 e0 10shl$0x10,%eax 472: 25 00 00 ff 00 and$0xff,%eax 477: 0d 2b 02 00 08 or $0x800022b,%eax Saving the code: 47c: 89 45 f8mov%eax,0xfff8(%ebp) 47f: 53 push %ebx Loading the arguments (here only one): 480: 89 fb mov%edi,%ebx Restoring the code again, issuing the syscall: 482: 8b 45 f8mov0xfff8(%ebp),%eax 485: cd 80 int$0x80 487: 5b pop%ebx Function epilogue: 488: 83 c4 10add$0x10,%esp 48b: 5f pop%edi 48c: 5d pop%ebp 48d: c3 ret Looking at this code, I also started thinking about inlining short and probably heavily-used functions into the user code. This would save the function prologue/epilogue both in the lib and the user code itself. For sure, it only makes sense for time-critical functions (think of mutex_lock/unlock or rt_timer_read). But inlining could be made optional The best optimization for rt_timer_read() would be to do the cycles-to-ns conversion in user-space from a direct TSC reading if the arch supports it (most do). Of course, this would only be possible for strictly aperiodic timing setups (i.e. CONFIG_XENO_OPT_PERIODIC_TIMING off). For the rt_mutex_lock()/unlock(), we still need to refrain from calling the kernel for uncontended access by using some Xeno equivalent of the futex approach, which would suppress most of the incentive to micro-optimize the call itself. Ack. That's a bit more complex to realise but should be put on our to-do list as well. for the user by providing both the library variant and the inlined version. The users could then select the preferred one by #defining some control switch before including the skin headers. Any thoughts on this? And, almost more important, anyone around willing to work on these optimisations and evaluate the results? I can't ATM. Quite frankly, I remember that I once had to clean up the LXRT inlining support in RTAI 3.0/3.1, and this was far from being fun stuff to do. Basically, AFAICT, having both inline and out-of-line support for library calls almost invariably ends up to a maintenance nightmare of some sort, e.g. depending whether to compile with gcc's optimization on or not, which might be dictated by the fact that one also wants (exploitable) debug information or not, and so on. Not to speak of the fact that you end up having two implementations to maintain separately. This said, only the figures would tell us if such inlining brings something significant or not to the picture performance-wise on low-end hw, so I'd be interested to see those first. I agree! I have also doubts that using inline function will improve latencies. Larger code also results in more TBL misses and cache refills. I'm definitely not arguing for the aggressive inlining RTAI provides (all or nothing). I'm rather suggesting selective optional inlining of trivial and heavily-used stubs. Think of something like rt_sem_v() for the library call vs. rt_sem_v_inlined() for an unrolled variant. rt_sem_v() could simply be implemented by calling rt_sem_v_inlined(), thus there should also be no reason why the maintenance hell might get reopened. I think the order I presented should be kept when looking at these optimisations: 1. reduce the complexity of existing syscall code, 2. consider if we can provide any benefit to the user by offering /some/ functions
Re: [Xenomai-core] [RFC][patch] per-process data.
Dmitry Adamushko wrote: On 08/05/06, Gilles Chanteperdrix [EMAIL PROTECTED] wrote: Jan Kiszka wrote: Likely I did not yet get the full picture: What prevents using another adeos per-task key for this? We would need a per-task key for every skin that needs a per-process data, but more importantly, we would need to track the clone syscalls (that would be another adeos event) in order to propagate the per-task data to each thread of the process. Or maybe ptds are inherited upon clone ? All of them are cleared in kernel/fork.c :: copy_process(). At least, the one pointed by nkgkptd (keeps shadow's xnthread_t *) must be cleared. Another logic could be applied for the rest, I guess. It make sense, if they were not cleared, they would not be per-thread data. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [RFC][patch] per-process data.
Gilles Chanteperdrix wrote: These patches are not ready for inclusion, they are not tested yet. The attached versions are tested. I still wonder if handling this in shadow.c is the right solution, or if there should be an xnppd_set call that could be called from within the skins event callbacks. -- Gilles Chanteperdrix. --- ./include/asm-i386/ipipe.h~ 2006-05-07 16:00:37.0 +0200 +++ ./include/asm-i386/ipipe.h 2006-05-07 19:00:34.0 +0200 @@ -135,7 +135,8 @@ do { \ #define IPIPE_EVENT_SIGWAKE(IPIPE_FIRST_EVENT + 2) #define IPIPE_EVENT_SETSCHED (IPIPE_FIRST_EVENT + 3) #define IPIPE_EVENT_EXIT (IPIPE_FIRST_EVENT + 4) -#define IPIPE_LAST_EVENT IPIPE_EVENT_EXIT +#define IPIPE_EVENT_PEXIT (IPIPE_FIRST_EVENT + 5) +#define IPIPE_LAST_EVENT IPIPE_EVENT_PEXIT #define IPIPE_NR_EVENTS(IPIPE_LAST_EVENT + 1) struct ipipe_domain; --- ./include/linux/ipipe.h~2006-05-07 16:00:36.0 +0200 +++ ./include/linux/ipipe.h 2006-05-07 19:01:01.0 +0200 @@ -439,6 +439,12 @@ static inline void ipipe_exit_notify(str __ipipe_dispatch_event(IPIPE_EVENT_EXIT,p); } +static inline void ipipe_pexit_notify(struct mm_struct *m) +{ + if (__ipipe_event_pipelined_p(IPIPE_EVENT_PEXIT)) + __ipipe_dispatch_event(IPIPE_EVENT_PEXIT,m); +} + static inline int ipipe_trap_notify(int ex, struct pt_regs *regs) { return __ipipe_event_pipelined_p(ex) ? __ipipe_dispatch_event(ex,regs) : 0; @@ -633,6 +639,7 @@ void fastcall *ipipe_get_ptd(int key); #define ipipe_sigwake_notify(p)do { } while(0) #define ipipe_setsched_notify(p) do { } while(0) #define ipipe_exit_notify(p) do { } while(0) +#define ipipe_mmput_notify(m) do { } while(0) #define ipipe_init_proc() do { } while(0) #define ipipe_trap_notify(t,r) 0 --- ./kernel/fork.c~2006-05-07 16:00:36.0 +0200 +++ ./kernel/fork.c 2006-05-07 19:00:02.0 +0200 @@ -368,6 +368,7 @@ void fastcall __mmdrop(struct mm_struct void mmput(struct mm_struct *mm) { if (atomic_dec_and_test(mm-mm_users)) { + ipipe_pexit_notify(mm); exit_aio(mm); exit_mmap(mm); if (!list_empty(mm-mmlist)) { Index: include/asm-generic/hal.h === --- include/asm-generic/hal.h (revision 1029) +++ include/asm-generic/hal.h (working copy) @@ -236,6 +236,14 @@ return RTHAL_EVENT_PROPAGATE; \ } +#define RTHAL_DECLARE_PEXIT_EVENT(hdlr) \ +static int hdlr (unsigned event, struct ipipe_domain *ipd, void *data) \ +{ \ +struct mm_struct *mm = (struct mm_struct *)data; \ +do_##hdlr(mm); \ +return RTHAL_EVENT_PROPAGATE; \ +} + #ifndef IPIPE_EVENT_SELF /* Some early I-pipe versions don't have this. */ #define IPIPE_EVENT_SELF 0 @@ -255,6 +263,8 @@ #define IPIPE_WIRED_MASK 0 #endif /* !IPIPE_WIRED_MASK */ +#define rthal_catch_pexit(hdlr) \ +ipipe_catch_event(ipipe_root_domain,IPIPE_EVENT_PEXIT,hdlr) #define rthal_catch_taskexit(hdlr) \ ipipe_catch_event(ipipe_root_domain,IPIPE_EVENT_EXIT,hdlr) #define rthal_catch_sigwake(hdlr) \ --- /dev/null 2006-05-08 00:21:12.004363750 +0200 +++ include/nucleus/ppd.h 2006-05-07 18:42:46.0 +0200 @@ -0,0 +1,22 @@ +#ifndef PPD_H +#define PPD_H + +#include nucleus/queue.h + +struct mm_struct; + +typedef struct xnppd_key { +unsigned long skin_magic; +struct mm_struct *mm; +} xnppd_key_t; + +typedef struct xnppd_holder { +xnppd_key_t key; +xnholder_t link; +#define link2ppd(laddr) \ +(xnppd_holder_t *)((char *)(laddr) - offsetof(xnppd_holder_t, link)) +} xnppd_holder_t; + +xnppd_holder_t *xnppd_get(unsigned skin_magic); + +#endif /* PPD_H */ Index: include/nucleus/shadow.h === --- include/nucleus/shadow.h(revision 1029) +++ include/nucleus/shadow.h(working copy) @@ -48,7 +48,7 @@ unsigned magic; int nrcalls; atomic_counter_t refcnt; -int (*eventcb)(int); +void *(*eventcb)(int, void *); xnsysent_t *systab; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc; @@ -89,7 +89,7 @@ unsigned magic, int nrcalls, xnsysent_t *systab, - int (*eventcb)(int event)); + void *(*eventcb)(int event, void *data)); int xnshadow_unregister_interface(int muxid); Index: ksrc/nucleus/shadow.c === --- ksrc/nucleus/shadow.c (revision 1029) +++ ksrc/nucleus/shadow.c (working copy) @@ -48,6 +48,8 @@ #include nucleus/shadow.h #include
Re: [Xenomai-core] [RFC][patch] per-process data.
Gilles Chanteperdrix wrote: Gilles Chanteperdrix wrote: These patches are not ready for inclusion, they are not tested yet. The attached versions are tested. I still wonder if handling this in shadow.c is the right solution, or if there should be an xnppd_set call that could be called from within the skins event callbacks. That would likely be better, since some skin might not want any cleanup code to be called, and in the current implementation, every skin needs to provide some ppd info when returning from the event callback, even if only the CLIENT_ATTACH event is to be monitored. On the other hand, we could argue that registering an event callback requires to implement all requests, including CLIENT_DETACH, possibly by returning a no-op value, so that no ppd registration would be done from bind_to_interface(). Actually, I think the latter would be the better option. --- ./include/asm-i386/ipipe.h~ 2006-05-07 16:00:37.0 +0200 +++ ./include/asm-i386/ipipe.h 2006-05-07 19:00:34.0 +0200 @@ -135,7 +135,8 @@ do { \ #define IPIPE_EVENT_SIGWAKE(IPIPE_FIRST_EVENT + 2) #define IPIPE_EVENT_SETSCHED (IPIPE_FIRST_EVENT + 3) #define IPIPE_EVENT_EXIT (IPIPE_FIRST_EVENT + 4) -#define IPIPE_LAST_EVENT IPIPE_EVENT_EXIT +#define IPIPE_EVENT_PEXIT (IPIPE_FIRST_EVENT + 5) +#define IPIPE_LAST_EVENT IPIPE_EVENT_PEXIT #define IPIPE_NR_EVENTS(IPIPE_LAST_EVENT + 1) I've merged such support for process cleanup event notification in the Adeos codebase, it will be available with the next patches to come. I've named it IPIPE_EVENT_CLEANUP instead of PEXIT though, since that's what it is about, and EXIT/PEXIT might be confusing. +} + +static inline struct xnppd_holder * +xnppd_lookup(unsigned skin_magic, struct mm_struct *mm, unsigned remove) +{ Let's separate the lookup and removal ops here, making the latter return the removed element. All-in-one stuff may save a few lines but is a pain for readability. +static inline void do_pexit_event (struct mm_struct *mm) +{ +unsigned muxid; +spl_t s; + +for (muxid = 0; muxid XENOMAI_MUX_NR; muxid++) +{ +xnlock_get_irqsave(nklock, s); +if (muxtable[muxid].systab muxtable[muxid].eventcb) +{ +struct xnppd_holder *ppd; + +ppd = xnppd_lookup(muxtable[muxid].magic, mm, 1); + +if (ppd) +muxtable[muxid].eventcb(XNSHADOW_CLIENT_DETACH, ppd); +} +xnlock_put_irqrestore(nklock, s); +} +} + Maybe a better approach would be to simply hash holders referring to busy muxtable entries on mm struct pointers? This way, we would only have to scan a list handling one element per attached skin per terminating process, which will be most of the time a matter of walking through a 1/2 element queue. i.e. indexing table entries on mm pointers from bind_to_interface() to ask for the CLIENT_DETACH event to be fired upon process cleanup: (key: curr-mm) - (holder: muxtable[muxid1]) - (holder: muxtable[muxid2]) - (holder: muxtable[muxid3]) then, scanning the list hashed on the terminating process's mm struct, in order to fire the appropriate event callbacks. Btw, on 2.4 at least, the global Linux mm_lock is held before firing the process cleanup notification at Adeos level, so one has to be careful about what's actually done in those per-skin cleanup handlers. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [patch] prepare-kernel shouldnt alter files thru links
Jim Cromie wrote: attached patch corrects a mistake in rev 985, which chmod'd a read-only file, even if it was a symlink from a kernel-tree cloned with lndir. This resulted in a bad original tree for use in building vanilla kernels. with patch, script renames the symlink, copies it to the expected name, and *then* chmods it, and appends to it. Now vanilla kernel builds also using same tree re-make w/o actually recompiling anything. Sounds reasonable. Applied, thanks. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] kconfig help questions
Jim Cromie wrote: Going thru xenomai Kconfig again, some observations/uncertainties came up. I'll make a patch, given feedback. XENO_OPT_TIMING_PERIODIC Aperiodic mode provides for non-constant delays between timer ticks, the wording here (non-constant delays) left me momentarily wondering if _APERIODIC was bad (this, despite the use of 'provides'). Maybe sub-tick delays makes some sense .. The fundamental issue is that aperiodic mode is tick-less, in the sense that it does not define any constant period value; in this respect, introducing the sub-tick notion would not make much sense. What about arbitrary delays instead? XENO_OPT_SHIRQ_LEVEL Are there any decent estimates or examples of the latency / jitter increases if this is enabled ? Is it highly cpu or chipset dependent ? More cache misses since we need to scan the entire chain of handlers, and obviously more processing from the various handlers. I presume /proc/interrupts is the place to look to see if you might need it, iff any of the devices of interest are shared. my laptop shows sharing, my soekris does not 3: 5 XT-PIC ehci_hcd:usb1, ohci1394 4: 0 XT-PIC uhci_hcd:usb6 7: 708975 XT-PIC ipw2200, ehci_hcd:usb2, uhci_hcd:usb7 8: 1 XT-PIC rtc 9: 437509 XT-PIC acpi, Intel 82801DB-ICH4, Intel 82801DB-ICH4 Modem, ohci_hcd:usb3, ohci_hcd:usb4, uhci_hcd:usb5, yenta, eth0 # cat /proc/interrupts CPU0 0: 13148086 XT-PIC timer 2: 0 XT-PIC cascade 4: 33084 XT-PIC serial 8: 4 XT-PIC rtc 10: 337134 XT-PIC eth0 11: 926639 XT-PIC ndiswrapper 14: 11 XT-PIC ide0 NMI: 0 It's more a sharing in the RT sense, so you likely would not have to share the IRQs with the non-RT Linux side altogether, since this adds more complexity to the picture (specifically: how to share the ack/eoi cycle between the RT and non-RT sides). XENO_OPT_SHIRQ_EDGE does this only apply to ISA bus ? Not exclusively. x86-wise, the IO-APIC is both capable of handling edge and level interrupts. XENO_SKIN_NATIVE will add module name XENO_OPT_NATIVE_INTR Note that the preferred way of implementing generic drivers usable across all Xenomai interfaces is defined by the Real-Time Driver Model (RTDM). It doesnt say theyre (in)?compatible, should it ? They are not. The native interrupt support is providing basic access to the core interrupt layer RTDM does use for its own purpose too. SUMMARY Id like to see estimates of the latency costs associated with each choice, where its practical to do so (given the current unknowables, like variations across cpus, chipsets etc). I recognize that such numbers are the hoped-for end result of the xenomai-data ML, and perhaps nothing real can be said yet (or ever) in Kconfig, except in (overly?) broad statements. IMO, it seems rather difficult to state something sensible regarding the latency introduced by IRQ sharing in the Kconfig help strings, since it would basically depend on what the shared IRQ handlers are doing in the first place. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core