Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >> > Gilles Chanteperdrix wrote: >> > > Jan Kiszka wrote: >> > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. >> > > > Why do you have this safety check for the owner also in production >> code? >> > > >> > > Because only one broken xnlock_put could entail a chain reaction of >> > > broken xnlock sections with code on multiple CPU violating critical >> > > sections. With the test, we prevent the chain reaction. But I agree this >> > > check should not be silent. >> > >> > When there is a bug, then there is bug and we are hosed. That's why we >> > have debug checks for finding such cases in advance. Here I was talking >> > about such a debug check in a hot path on a _production_ system, and >> > that check even had no fault recovery. That appeared pointless to me. >> > >> > Just to avoid misunderstandings: This version is not different from the >> > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced >> > to cover specifically lock debugging. >> > >> > Do you have an idea for some cheap fault recovery for broken locking >> > that we should put in instead? >> >> The fault recovery is to leave the xnlock untouched, in order to avoid >> the chain reaction I was talking about. > > But you may later deadlock on it when trying to reacquire this > unbalanced lock. It can help against double releases, granted. Still, > such cases should be eliminated _ahead_ of release via review, so that > one can finally go for the fast unchecked version in the matured system. > >> Anyway, I think even production >> code should contain some sanity checks, and this one looks cheap. > > I'm fine with a simple check as well. But there should be an _option_ > for such checks, even more if they are supposed to be inlined. > Uninlining reduces this pressure, but it still adds text size to the hot > path. I don't think this is strictly equivalent. The cause of deadlock is no rocket science to diagnose even if it may be painful to fix, so our main problem is critical section breakage. And this particular kind of bugs is extremely sensitive to the instrumentation overhead, to the point where enabling a debug option would turn it into a damned Heisenbug. Unless you want to create a unique debug option to control each and every tiny check, you will likely bind it to a more general debug option, turning on a significant amount of unrelated debug code, which could in turn affect the bug itself. This is where I agree with Gilles, I would rather pay peanuts cycles when acquiring a lock on SMP machines to run this check, than chase wild gooses for days. Some key issues like obvious critical section breakage deserve systematic detection, including in the field. Due to their very nature, maybe you won't be able to trigger some of them elsewhere anyway, because in some complex application cases (the ones we more often have with SMP configs), you may just fake part of the operating environment to debug the application (network feed simulation and so on). But, and there I tend to agree with you, uninlining the containing code is definitely something I would combine with such approach, since I-cache is a precious thing to save. Anyway, measurements will tell. -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > Gilles Chanteperdrix wrote: > > > > Jan Kiszka wrote: > > > > > As the #ifdef forest was cut down, I once again looked at > > xnlock_put. > > > > > Why do you have this safety check for the owner also in production > > code? > > > > > > > > Because only one broken xnlock_put could entail a chain reaction of > > > > broken xnlock sections with code on multiple CPU violating critical > > > > sections. With the test, we prevent the chain reaction. But I agree > > this > > > > check should not be silent. > > > > > > When there is a bug, then there is bug and we are hosed. That's why we > > > have debug checks for finding such cases in advance. Here I was talking > > > about such a debug check in a hot path on a _production_ system, and > > > that check even had no fault recovery. That appeared pointless to me. > > > > > > Just to avoid misunderstandings: This version is not different from the > > > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > > > to cover specifically lock debugging. > > > > > > Do you have an idea for some cheap fault recovery for broken locking > > > that we should put in instead? > > > > The fault recovery is to leave the xnlock untouched, in order to avoid > > the chain reaction I was talking about. > > But you may later deadlock on it when trying to reacquire this > unbalanced lock. It can help against double releases, granted. Still, > such cases should be eliminated _ahead_ of release via review, so that > one can finally go for the fast unchecked version in the matured system. A deadlock is far easier to spot than a violated critical section. > > > Anyway, I think even production > > code should contain some sanity checks, and this one looks cheap. > > I'm fine with a simple check as well. But there should be an _option_ > for such checks, even more if they are supposed to be inlined. > Uninlining reduces this pressure, but it still adds text size to the hot > path. No, no option, some sanity checks should be unavoidable, this one is such a check. We will probably never agree I guess. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Jan Kiszka wrote: > > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > > > Why do you have this safety check for the owner also in production > code? > > > > > > Because only one broken xnlock_put could entail a chain reaction of > > > broken xnlock sections with code on multiple CPU violating critical > > > sections. With the test, we prevent the chain reaction. But I agree this > > > check should not be silent. > > > > When there is a bug, then there is bug and we are hosed. That's why we > > have debug checks for finding such cases in advance. Here I was talking > > about such a debug check in a hot path on a _production_ system, and > > that check even had no fault recovery. That appeared pointless to me. > > > > Just to avoid misunderstandings: This version is not different from the > > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > > to cover specifically lock debugging. > > > > Do you have an idea for some cheap fault recovery for broken locking > > that we should put in instead? > > The fault recovery is to leave the xnlock untouched, in order to avoid > the chain reaction I was talking about. But you may later deadlock on it when trying to reacquire this unbalanced lock. It can help against double releases, granted. Still, such cases should be eliminated _ahead_ of release via review, so that one can finally go for the fast unchecked version in the matured system. > Anyway, I think even production > code should contain some sanity checks, and this one looks cheap. I'm fine with a simple check as well. But there should be an _option_ for such checks, even more if they are supposed to be inlined. Uninlining reduces this pressure, but it still adds text size to the hot path. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > > Why do you have this safety check for the owner also in production code? > > > > Because only one broken xnlock_put could entail a chain reaction of > > broken xnlock sections with code on multiple CPU violating critical > > sections. With the test, we prevent the chain reaction. But I agree this > > check should not be silent. > > When there is a bug, then there is bug and we are hosed. That's why we > have debug checks for finding such cases in advance. Here I was talking > about such a debug check in a hot path on a _production_ system, and > that check even had no fault recovery. That appeared pointless to me. > > Just to avoid misunderstandings: This version is not different from the > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > to cover specifically lock debugging. > > Do you have an idea for some cheap fault recovery for broken locking > that we should put in instead? The fault recovery is to leave the xnlock untouched, in order to avoid the chain reaction I was talking about. Anyway, I think even production code should contain some sanity checks, and this one looks cheap. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > Why do you have this safety check for the owner also in production code? > > Because only one broken xnlock_put could entail a chain reaction of > broken xnlock sections with code on multiple CPU violating critical > sections. With the test, we prevent the chain reaction. But I agree this > check should not be silent. When there is a bug, then there is bug and we are hosed. That's why we have debug checks for finding such cases in advance. Here I was talking about such a debug check in a hot path on a _production_ system, and that check even had no fault recovery. That appeared pointless to me. Just to avoid misunderstandings: This version is not different from the old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced to cover specifically lock debugging. Do you have an idea for some cheap fault recovery for broken locking that we should put in instead? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
Jan Kiszka wrote: > As the #ifdef forest was cut down, I once again looked at xnlock_put. > Why do you have this safety check for the owner also in production code? Because only one broken xnlock_put could entail a chain reaction of broken xnlock sections with code on multiple CPU violating critical sections. With the test, we prevent the chain reaction. But I agree this check should not be silent. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put
As the #ifdef forest was cut down, I once again looked at xnlock_put. Why do you have this safety check for the owner also in production code? Let's move it into the debug section. That leaves up with static inline void xnlock_put(xnlock_t *lock) { xnlock_dbg_release(lock); atomic_set(&lock->owner, ~0); } Nice - but wait, where is the memory barrier here? Was it in the original code? No! Neither atomic_read, atomic_set, nor the rthal_processor_id imply any kind of barrier to ensure all memory writes inside the protected region are committed before releasing the lock. Cool, there was a real bug under all this dust, and this patch fixes it + it shortens the hot path on SMP production systems. Jan --- include/asm-generic/system.h | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) Index: b/include/asm-generic/system.h === --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -163,7 +163,20 @@ static inline void xnlock_dbg_release(xn { extern xnlockinfo_t xnlock_stats[]; unsigned long long lock_time = rthal_rdtsc() - lock->lock_date; - xnlockinfo_t *stats = &xnlock_stats[xnarch_current_cpu()]; + int cpu = xnarch_current_cpu(); + xnlockinfo_t *stats = &xnlock_stats[cpu]; + + if (unlikely(atomic_read(&lock->owner) != cpu)) { + rthal_emergency_console(); + printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p" +" on CPU #%d\n" +" owner = %s:%u (%s(), CPU #%d)\n", + lock, cpu, lock->file, lock->line, lock->function, + lock->cpu); + show_stack(NULL,NULL); + for (;;) + cpu_relax(); + } if (lock_time > stats->lock_time) { stats->lock_time = lock_time; @@ -174,17 +187,6 @@ static inline void xnlock_dbg_release(xn } } -static inline void xnlock_dbg_invalid_release(xnlock_t *lock) -{ - rthal_emergency_console(); - printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p\n" - " owner = %s:%u (%s(), CPU #%d)\n", - lock, lock->file, lock->line, lock->function, lock->cpu); - show_stack(NULL,NULL); - for (;;) - cpu_relax(); -} - #else /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ typedef struct { atomic_t owner; } xnlock_t; @@ -198,8 +200,7 @@ typedef struct { atomic_t owner; } xnloc #define XNLOCK_DBG_SPINNING() do { } while (0) #define XNLOCK_DBG_ACQUIRED() do { } while (0) -static inline void xnlock_dbg_release(xnlock_t *lock) { } -static inline void xnlock_dbg_invalid_release(xnlock_t *lock) { } +static inline void xnlock_dbg_release(xnlock_t *lock) { } #endif /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ @@ -325,11 +326,10 @@ static inline int __xnlock_get(xnlock_t static inline void xnlock_put(xnlock_t *lock) { - if (likely(atomic_read(&lock->owner) == xnarch_current_cpu())) { - xnlock_dbg_release(lock); - atomic_set(&lock->owner, ~0); - } else - xnlock_dbg_invalid_release(lock); + xnarch_memory_barrier(); + + xnlock_dbg_release(lock); + atomic_set(&lock->owner, ~0); } static inline spl_t ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core