Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Well, this patch is mostly about whitespace and formatting fixes (among which ifdef reduction falls for me as well). But I can split it up if desired. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? Granted, not nice but so far the most compact approach I found. My goal was to keep the lock implementations as pure as possible (you can easily ignore the debug stuff now when reading xnlock_get/put). - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. I'm open to learn a third way to achieve what we need. I'm just convinced that the old way was far worse. Please consider for a better suggestion that the number of variants increase with my ticket lock. That's why I tried to stuff things in macros. Hmm, maybe we should simply get rid of the file/line/function stuff completely and switch to IP + ksyms. What do you think? I do not want to leave this in a dead end. IMO, your approach make xnlock_get readable in the non debugging case at the expense of its readability in the debugging case. I would better see the two implementations with a unique ifdef. Granted, there will be some code duplication, but it will not be that much, and this allows us to move the debugging version out of line while keeping the non debugging case inline. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Well, this patch is mostly about whitespace and formatting fixes (among which ifdef reduction falls for me as well). But I can split it up if desired. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? Granted, not nice but so far the most compact approach I found. My goal was to keep the lock implementations as pure as possible (you can easily ignore the debug stuff now when reading xnlock_get/put). - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. I'm open to learn a third way to achieve what we need. I'm just convinced that the old way was far worse. Please consider for a better suggestion that the number of variants increase with my ticket lock. That's why I tried to stuff things in macros. Hmm, maybe we should simply get rid of the file/line/function stuff completely and switch to IP + ksyms. What do you think? I do not want to leave this in a dead end. IMO, your approach make xnlock_get readable in the non debugging case at the expense of its readability in the debugging case. I would better see the two implementations with a unique ifdef. Granted, there will be some code duplication, but it will not be that much, and this allows us to move the debugging version out of line while keeping the non debugging case inline. Don't panic. I'm sitting on a new version of this patch series, only running a final benchmark to estimate the gain. This unfortunately takes a lot of time. BTW, the series will also add lock debugging for UP, and it beautifies the refactoring a bit more, addressing your concerns. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PATCH 1/4] Refactor generic system.h
In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. Jan --- include/asm-generic/system.h | 412 +-- 1 file changed, 204 insertions(+), 208 deletions(-) Index: b/include/asm-generic/system.h === --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -47,18 +47,22 @@ #define CONFIG_XENO_OPT_DEBUG_NUCLEUS 0 #endif +#ifdef __cplusplus +extern C { +#endif + /* Time base export */ #define xnarch_declare_tbase(base) do { } while(0) /* Tracer interface */ #define xnarch_trace_max_begin(v) rthal_trace_max_begin(v) -#define xnarch_trace_max_end(v) rthal_trace_max_end(v) +#define xnarch_trace_max_end(v) rthal_trace_max_end(v) #define xnarch_trace_max_reset() rthal_trace_max_reset() #define xnarch_trace_user_start() rthal_trace_user_start() #define xnarch_trace_user_stop(v) rthal_trace_user_stop(v) -#define xnarch_trace_user_freeze(v, once) rthal_trace_user_freeze(v, once) +#define xnarch_trace_user_freeze(v, once) rthal_trace_user_freeze(v, once) #define xnarch_trace_special(id, v) rthal_trace_special(id, v) -#define xnarch_trace_special_u64(id, v) rthal_trace_special_u64(id, v) +#define xnarch_trace_special_u64(id, v) rthal_trace_special_u64(id, v) #define xnarch_trace_pid(pid, prio) rthal_trace_pid(pid, prio) #define xnarch_trace_panic_freeze() rthal_trace_panic_freeze() #define xnarch_trace_panic_dump() rthal_trace_panic_dump() @@ -81,26 +85,32 @@ typedef unsigned long spl_t; #define spltest() rthal_local_irq_test() #define splget(x) rthal_local_irq_flags(x) -#if defined(CONFIG_SMP) defined(CONFIG_XENO_OPT_DEBUG) +static inline unsigned xnarch_current_cpu(void) +{ + return rthal_processor_id(); +} + +#if defined(CONFIG_SMP) XENO_DEBUG(NUCLEUS) + typedef struct { -unsigned long long spin_time; -unsigned long long lock_time; -const char *file; -const char *function; -unsigned line; + unsigned long long spin_time; + unsigned long long lock_time; + const char *file; + const char *function; + unsigned line; } xnlockinfo_t; typedef struct { -atomic_t owner; -const char *file; -const char *function; -unsigned line; -int cpu; -unsigned long long spin_time; -unsigned long long lock_date; + atomic_t owner; + const char *file; + const char *function; + unsigned line; + int cpu; + unsigned long long spin_time; + unsigned long long lock_date; } xnlock_t; @@ -114,70 +124,137 @@ typedef struct { 0LL, \ } -#else /* !(CONFIG_SMP CONFIG_XENO_OPT_DEBUG) */ +#define XNLOCK_DBG_CONTEXT , __FILE__, __LINE__, __FUNCTION__ +#define XNLOCK_DBG_CONTEXT_ARGS \ + , const char *file, int line, const char *function +#define XNLOCK_DBG_PASS_CONTEXT , file, line, function + +#define XNLOCK_DBG_PREPARE_ACQUIRE() \ + unsigned long long __lock_date = rthal_rdtsc(); \ + unsigned __spin_limit = 300 + +#define XNLOCK_DBG_SPINNING() \ + do {\ + if (__spin_limit-- == 0) { \ + rthal_emergency_console(); \ + printk(KERN_ERR \ + Xenomai: stuck on nucleus lock %p\n \ + waiter = %s:%u (%s(), CPU #%d)\n \ + owner = %s:%u (%s(), CPU #%d)\n, \ + lock, file, line, function, cpu, lock-file, \ + lock-line, lock-function, lock-cpu); \ + show_stack(NULL, NULL);\ + for (;;) \ +cpu_relax();\ + } \ + } while (0) + +#define XNLOCK_DBG_ACQUIRED() \ + do {\ + lock-spin_time = rthal_rdtsc() - __lock_date; \ + lock-lock_date = __lock_date;\ + lock-file = file; \ + lock-function = function;\ + lock-line = line; \ + lock-cpu = cpu; \ + } while (0) + +static inline void xnlock_dbg_release(xnlock_t *lock) +{ + extern xnlockinfo_t xnlock_stats[]; + unsigned long long lock_time = rthal_rdtsc() - lock-lock_date; + xnlockinfo_t *stats = xnlock_stats[xnarch_current_cpu()]; + + if (lock_time stats-lock_time) { + stats-lock_time = lock_time; + stats-spin_time = lock-spin_time; + stats-file = lock-file; + stats-function = lock-function; + stats-line = lock-line; + } +} + +static inline void xnlock_dbg_invalid_release(xnlock_t *lock) +{ + rthal_emergency_console(); + printk(KERN_ERR Xenomai: unlocking unlocked nucleus lock %p\n + owner = %s:%u (%s(), CPU #%d)\n, + lock, lock-file, lock-line, lock-function, lock-cpu); + show_stack(NULL,NULL); + for (;;) + cpu_relax(); +} + +#else /* !(CONFIG_SMP XENO_DEBUG(NUCLEUS))
Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h
Jan Kiszka wrote: In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. Something we could be interesting would be to be able to enable spinlocks debug in UP, which would enable real debugging xnlocks in this case. I made an attempt of doing this on ARM some time ago, this generated a kernel that would lockup at boot. But I think this is something we should sort out. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h
Gilles Chanteperdrix wrote: Jan Kiszka wrote: In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Well, this patch is mostly about whitespace and formatting fixes (among which ifdef reduction falls for me as well). But I can split it up if desired. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? Granted, not nice but so far the most compact approach I found. My goal was to keep the lock implementations as pure as possible (you can easily ignore the debug stuff now when reading xnlock_get/put). - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. I'm open to learn a third way to achieve what we need. I'm just convinced that the old way was far worse. Please consider for a better suggestion that the number of variants increase with my ticket lock. That's why I tried to stuff things in macros. Hmm, maybe we should simply get rid of the file/line/function stuff completely and switch to IP + ksyms. What do you think? Something we could be interesting would be to be able to enable spinlocks debug in UP, which would enable real debugging xnlocks in this case. I made an attempt of doing this on ARM some time ago, this generated a kernel that would lockup at boot. But I think this is something we should sort out. Yeah, sounds good and should be feasible. Will check. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Well, this patch is mostly about whitespace and formatting fixes (among which ifdef reduction falls for me as well). But I can split it up if desired. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? Granted, not nice but so far the most compact approach I found. My goal was to keep the lock implementations as pure as possible (you can easily ignore the debug stuff now when reading xnlock_get/put). - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. I'm open to learn a third way to achieve what we need. I'm just convinced that the old way was far worse. I do not see a third approach. Maybe passing all the arguments to the function, and count on the optimizer to remove useless arguments when inlining ? Please consider for a better suggestion that the number of variants increase with my ticket lock. That's why I tried to stuff things in macros. Hmm, maybe we should simply get rid of the file/line/function stuff completely and switch to IP + ksyms. What do you think? I find the file line approach more precise. print_symbol gives you an offset, you have to disassemble to understand what it means, and it does not see inline functions. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core