Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Philippe Gerum
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>  > Gilles Chanteperdrix wrote:
>>  > > Jan Kiszka wrote:
>>  > >  > As the #ifdef forest was cut down, I once again looked at xnlock_put.
>>  > >  > Why do you have this safety check for the owner also in production 
>> code?
>>  > > 
>>  > > Because only one broken xnlock_put could entail a chain reaction of
>>  > > broken xnlock sections with code on multiple CPU violating critical
>>  > > sections. With the test, we prevent the chain reaction. But I agree this
>>  > > check should not be silent.
>>  > 
>>  > When there is a bug, then there is bug and we are hosed. That's why we
>>  > have debug checks for finding such cases in advance. Here I was talking
>>  > about such a debug check in a hot path on a _production_ system, and
>>  > that check even had no fault recovery. That appeared pointless to me.
>>  > 
>>  > Just to avoid misunderstandings: This version is not different from the
>>  > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced
>>  > to cover specifically lock debugging.
>>  > 
>>  > Do you have an idea for some cheap fault recovery for broken locking
>>  > that we should put in instead?
>>
>> The fault recovery is to leave the xnlock untouched, in order to avoid
>> the chain reaction I was talking about.
> 
> But you may later deadlock on it when trying to reacquire this
> unbalanced lock. It can help against double releases, granted. Still,
> such cases should be eliminated _ahead_ of release via review, so that
> one can finally go for the fast unchecked version in the matured system.
> 
>> Anyway, I think even production
>> code should contain some sanity checks, and this one looks cheap.
> 
> I'm fine with a simple check as well. But there should be an _option_
> for such checks, even more if they are supposed to be inlined.
> Uninlining reduces this pressure, but it still adds text size to the hot
> path.

I don't think this is strictly equivalent. The cause of deadlock is no
rocket science to diagnose even if it may be painful to fix, so our main
problem is critical section breakage. And this particular kind of bugs
is extremely sensitive to the instrumentation overhead, to the point
where enabling a debug option would turn it into a damned Heisenbug.
Unless you want to create a unique debug option to control each and
every tiny check, you will likely bind it to a more general debug
option, turning on a significant amount of unrelated debug code, which
could in turn affect the bug itself. This is where I agree with Gilles,
I would rather pay peanuts cycles when acquiring a lock on SMP machines
to run this check, than chase wild gooses for days.

Some key issues like obvious critical section breakage deserve
systematic detection, including in the field. Due to their very nature,
maybe you won't be able to trigger some of them elsewhere anyway,
because in some complex application cases (the ones we more often have
with SMP configs), you may just fake part of the operating environment
to debug the application (network feed simulation and so on).

But, and there I tend to agree with you, uninlining the containing code
is definitely something I would combine with such approach, since
I-cache is a precious thing to save. Anyway, measurements will tell.

-- 
Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Gilles Chanteperdrix wrote:
 > > Jan Kiszka wrote:
 > >  > Gilles Chanteperdrix wrote:
 > >  > > Jan Kiszka wrote:
 > >  > >  > As the #ifdef forest was cut down, I once again looked at 
 > > xnlock_put.
 > >  > >  > Why do you have this safety check for the owner also in production 
 > > code?
 > >  > > 
 > >  > > Because only one broken xnlock_put could entail a chain reaction of
 > >  > > broken xnlock sections with code on multiple CPU violating critical
 > >  > > sections. With the test, we prevent the chain reaction. But I agree 
 > > this
 > >  > > check should not be silent.
 > >  > 
 > >  > When there is a bug, then there is bug and we are hosed. That's why we
 > >  > have debug checks for finding such cases in advance. Here I was talking
 > >  > about such a debug check in a hot path on a _production_ system, and
 > >  > that check even had no fault recovery. That appeared pointless to me.
 > >  > 
 > >  > Just to avoid misunderstandings: This version is not different from the
 > >  > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced
 > >  > to cover specifically lock debugging.
 > >  > 
 > >  > Do you have an idea for some cheap fault recovery for broken locking
 > >  > that we should put in instead?
 > > 
 > > The fault recovery is to leave the xnlock untouched, in order to avoid
 > > the chain reaction I was talking about.
 > 
 > But you may later deadlock on it when trying to reacquire this
 > unbalanced lock. It can help against double releases, granted. Still,
 > such cases should be eliminated _ahead_ of release via review, so that
 > one can finally go for the fast unchecked version in the matured system.

A deadlock is far easier to spot than a violated critical section.

 > 
 > > Anyway, I think even production
 > > code should contain some sanity checks, and this one looks cheap.
 > 
 > I'm fine with a simple check as well. But there should be an _option_
 > for such checks, even more if they are supposed to be inlined.
 > Uninlining reduces this pressure, but it still adds text size to the hot
 > path.

No, no option, some sanity checks should be unavoidable, this one is
such a check. We will probably never agree I guess.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > Gilles Chanteperdrix wrote:
>  > > Jan Kiszka wrote:
>  > >  > As the #ifdef forest was cut down, I once again looked at xnlock_put.
>  > >  > Why do you have this safety check for the owner also in production 
> code?
>  > > 
>  > > Because only one broken xnlock_put could entail a chain reaction of
>  > > broken xnlock sections with code on multiple CPU violating critical
>  > > sections. With the test, we prevent the chain reaction. But I agree this
>  > > check should not be silent.
>  > 
>  > When there is a bug, then there is bug and we are hosed. That's why we
>  > have debug checks for finding such cases in advance. Here I was talking
>  > about such a debug check in a hot path on a _production_ system, and
>  > that check even had no fault recovery. That appeared pointless to me.
>  > 
>  > Just to avoid misunderstandings: This version is not different from the
>  > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced
>  > to cover specifically lock debugging.
>  > 
>  > Do you have an idea for some cheap fault recovery for broken locking
>  > that we should put in instead?
> 
> The fault recovery is to leave the xnlock untouched, in order to avoid
> the chain reaction I was talking about.

But you may later deadlock on it when trying to reacquire this
unbalanced lock. It can help against double releases, granted. Still,
such cases should be eliminated _ahead_ of release via review, so that
one can finally go for the fast unchecked version in the matured system.

> Anyway, I think even production
> code should contain some sanity checks, and this one looks cheap.

I'm fine with a simple check as well. But there should be an _option_
for such checks, even more if they are supposed to be inlined.
Uninlining reduces this pressure, but it still adds text size to the hot
path.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Gilles Chanteperdrix wrote:
 > > Jan Kiszka wrote:
 > >  > As the #ifdef forest was cut down, I once again looked at xnlock_put.
 > >  > Why do you have this safety check for the owner also in production code?
 > > 
 > > Because only one broken xnlock_put could entail a chain reaction of
 > > broken xnlock sections with code on multiple CPU violating critical
 > > sections. With the test, we prevent the chain reaction. But I agree this
 > > check should not be silent.
 > 
 > When there is a bug, then there is bug and we are hosed. That's why we
 > have debug checks for finding such cases in advance. Here I was talking
 > about such a debug check in a hot path on a _production_ system, and
 > that check even had no fault recovery. That appeared pointless to me.
 > 
 > Just to avoid misunderstandings: This version is not different from the
 > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced
 > to cover specifically lock debugging.
 > 
 > Do you have an idea for some cheap fault recovery for broken locking
 > that we should put in instead?

The fault recovery is to leave the xnlock untouched, in order to avoid
the chain reaction I was talking about. Anyway, I think even production
code should contain some sanity checks, and this one looks cheap.


-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > As the #ifdef forest was cut down, I once again looked at xnlock_put.
>  > Why do you have this safety check for the owner also in production code?
> 
> Because only one broken xnlock_put could entail a chain reaction of
> broken xnlock sections with code on multiple CPU violating critical
> sections. With the test, we prevent the chain reaction. But I agree this
> check should not be silent.

When there is a bug, then there is bug and we are hosed. That's why we
have debug checks for finding such cases in advance. Here I was talking
about such a debug check in a hot path on a _production_ system, and
that check even had no fault recovery. That appeared pointless to me.

Just to avoid misunderstandings: This version is not different from the
old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced
to cover specifically lock debugging.

Do you have an idea for some cheap fault recovery for broken locking
that we should put in instead?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > As the #ifdef forest was cut down, I once again looked at xnlock_put.
 > Why do you have this safety check for the owner also in production code?

Because only one broken xnlock_put could entail a chain reaction of
broken xnlock sections with code on multiple CPU violating critical
sections. With the test, we prevent the chain reaction. But I agree this
check should not be silent.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put

2008-02-23 Thread Jan Kiszka
As the #ifdef forest was cut down, I once again looked at xnlock_put.
Why do you have this safety check for the owner also in production code?
Let's move it into the debug section. That leaves up with

static inline void xnlock_put(xnlock_t *lock)
{
xnlock_dbg_release(lock);
atomic_set(&lock->owner, ~0);
}

Nice - but wait, where is the memory barrier here? Was it in the
original code? No! Neither atomic_read, atomic_set, nor the
rthal_processor_id imply any kind of barrier to ensure all memory writes
inside the protected region are committed before releasing the lock.

Cool, there was a real bug under all this dust, and this patch fixes it
+ it shortens the hot path on SMP production systems.

Jan

---
 include/asm-generic/system.h |   38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

Index: b/include/asm-generic/system.h
===
--- a/include/asm-generic/system.h
+++ b/include/asm-generic/system.h
@@ -163,7 +163,20 @@ static inline void xnlock_dbg_release(xn
 {
 	extern xnlockinfo_t xnlock_stats[];
 	unsigned long long lock_time = rthal_rdtsc() - lock->lock_date;
-	xnlockinfo_t *stats = &xnlock_stats[xnarch_current_cpu()];
+	int cpu = xnarch_current_cpu();
+	xnlockinfo_t *stats = &xnlock_stats[cpu];
+
+	if (unlikely(atomic_read(&lock->owner) != cpu)) {
+		rthal_emergency_console();
+		printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p"
+" on CPU #%d\n"
+" owner  = %s:%u (%s(), CPU #%d)\n",
+		   lock, cpu, lock->file, lock->line, lock->function,
+		   lock->cpu);
+		show_stack(NULL,NULL);
+		for (;;)
+			cpu_relax();
+	}
 
 	if (lock_time > stats->lock_time) {
 		stats->lock_time = lock_time;
@@ -174,17 +187,6 @@ static inline void xnlock_dbg_release(xn
 	}
 }
 
-static inline void xnlock_dbg_invalid_release(xnlock_t *lock)
-{
-	rthal_emergency_console();
-	printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p\n"
-			"   owner  = %s:%u (%s(), CPU #%d)\n",
-	   lock, lock->file, lock->line, lock->function, lock->cpu);
-	show_stack(NULL,NULL);
-	for (;;)
-		cpu_relax();
-}
-
 #else /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */
 
 typedef struct { atomic_t owner; } xnlock_t;
@@ -198,8 +200,7 @@ typedef struct { atomic_t owner; } xnloc
 #define XNLOCK_DBG_SPINNING()		do { } while (0)
 #define XNLOCK_DBG_ACQUIRED()		do { } while (0)
 
-static inline void xnlock_dbg_release(xnlock_t *lock)		{ }
-static inline void xnlock_dbg_invalid_release(xnlock_t *lock)	{ }
+static inline void xnlock_dbg_release(xnlock_t *lock)	{ }
 
 #endif /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */
 
@@ -325,11 +326,10 @@ static inline int __xnlock_get(xnlock_t 
 
 static inline void xnlock_put(xnlock_t *lock)
 {
-	if (likely(atomic_read(&lock->owner) == xnarch_current_cpu())) {
-		xnlock_dbg_release(lock);
-		atomic_set(&lock->owner, ~0);
-	} else
-		xnlock_dbg_invalid_release(lock);
+	xnarch_memory_barrier();
+
+	xnlock_dbg_release(lock);
+	atomic_set(&lock->owner, ~0);
 }
 
 static inline spl_t
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core