On Wed, May 26, 2010 at 7:00 PM, Steven Dake <[email protected]> wrote:
> On 05/26/2010 03:37 AM, Colin wrote:
>>
>> Hi Again,
>>
>> looking at the differences between 1.2.0 and 1.2.3 it appears that
>> some problem was detected in logsys.c -- the log messages have a
>> dedicated queue independent of the flight recorder, the inter-thread
>> signalling was changed from condition variables to semaphores, ...
>> This seems to make the code even more complicated, because now there's
>> one mechanism for signalling, and one for locking, although condition
>> variables should be quite sufficient for a thread-safe queue and a
>
> Using the condition variable's lock in the main log api creates blocking -
> one of the four requirements we want to solve - see below.

Hm, I'm not convinced that this is the case, to the contrary, the new
code with the semaphores has exactly the same potential for blocking
as the old code does. Let me elaborate:

Let's take for grantet that, as long as you don't use some fancy
"lock-free" algorithms, a thread-safe queue needs some
mutual-exclusion mechanism (in logsys.c that would be
logsys_lock()/_unlock() in 1.2.0 and logsys_wthread_lock()/_unlock()
in 1.2.3).

So you end up with something like the following, in C-ish
(pseudo-)code (which might show that I haven't touched plain C for a
while, only C++):

void enqueue( whatever )
{
   lock();
   thread_unaware_enqueue_implementation( whatever );
   cond_signal();
   unlock();
}

void consumer()
{
   while ( true ) {
      lock();
      while ( thread_unaware_test_whether_queue_empty() ) {  // A
         cond_wait();
      }
      whatever = thread_unaware_dequeue_one_element();  // B
      unlock();
      // ... process ...
   }
}

This is as well behaved as possible in the sense that no blocking
operation is performed while the lock is held, i.e. as long as a
thread is not preempted while holding the lock, the lock will only be
held for a very short amount of time. So the only scenario in which
the, in the case of corosync: more time-ciritical compared to the
consumer, producer can block, is when the OS happens to preempt the
logsys worker thread in one of the lines marked A or B. (Remember that
the mutex is unlocked while sleeping in cond_wait()).

Changing the signalling from this canonical pattern to an additional
semaphore does not change this blocking behaviour in any way. The only
thing that happens is that the signalling now requires an additional
synchronisation primitive...

More precisely, if logsys_worker_thread() in 1.2.3 gets preempted
after logsys_wthread_lock(), but before logsys_wthread_unlock(), then
any thread calling log_printf_to_logs_wthread() will block in
logsys_thread_lock() until the worker thread does its thing (and
unlocks).

If you really want to get rid of any possibility of blocking in the
producer, even if it's just for the time that the consumer got
preempted with the lock held, you need quite a different approach,
that would probably be much more complicated (a try_lock() in
enqueue(), but what if you don't get the lock, do you use a per-thread
thread-specific queue to buffer until the next time the thread gets
the lock without waiting? What if that buffer is full? ...?)

> 1) use of counting posix semaphores instead of pthread condition variables
> which have to maintain their own counting externally with extra locks

As outlined above, in the case of a thread-safe queue I believe the
semaphore to be superfluous. In my 15 years of multi-threaded
programming I have never really needed a counting semaphore, either
(a) there's something to wait for and it naturally maps to a mutex
w/condition-variable, or (b) it's genuinely just a shared integer, in
which case atomic operations are sufficient. Rarely there's some
special-cased lock-free stuff.

>> Anyhow, we got 1.2.0 stable by short-circuiting _logsys_log_vprintf()
>> by inserting
>>
>>   syslog(rec_ident, "%s", logsys_print_buffer);
>>   return;
>>
>> at line 1335. Without this line, corosync eventually goes into a 100%
>> CPU loop which seems to be triggered by a non-log entry in the
>> flight-recorder with length 0, so that it doesn't get skipped
>> correctly (this is not fully confirmed, it's our current hypothesis).
>
> With debug: on we noticed the 100% spin when used with pacemaker.  This was
> addressed in 1.2.2.  I understand flight recorder looks complicated.  I am a
> believer in KISS, but sometimes complication is necessary to meet
> requirements.

On part of the "complexity" that I referred to is on the micro-level,
where a bit more layering of code and smaller functions could greatly
improve readability. I know that C doesn't reach the code density of
C++, where my functions typically have 3-10 lines, but that can be
perceived as stylistic issue, so better keep that ball low.

> I have very specific requirements based upon 8+ years of debugging field
> deployments of this code base:
>
> 1. Record all log events in a buffer so they may be replayed at a segfault
> (with corosync-fplay)
> 2. Do non-block output to the three log output targets (stderr, syslog,
> file) (this is why there is a separate logging thread)
> 3. Allow subsystems to be configured independently for various parts of the
> software (so for example, totem messages can log to file at debug level).
> 4. low overhead - high performance - this is achieved by all log events
> going to memory and other log events going to disk or syslog.

For performance, log_printf_to_logs_wthread() could fold the two calls
to malloc() into one, would also shorten the code.

> A simple solution of using syslog api doesn't meet any of the other
> requirements 1-4 above.

Yes, the syslog was a hack to narrow down the error, and to continue testing...

Regards, Colin
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to