Hi Florian,

Florian Zumbiehl wrote:
> Hi,
>
> while porting chan_visdn 0.18.3 to OpenPBX, I noticed in the generator
> code a number of race conditions, a deadlock with a non-working workaround,
> basically turning it into a delayed race condition, a memory leak, and
> an IMO somewhat strange design in the timing. The kindof-deadlock keeps
> chan_visdn from working properly.
>
> As I don't exactly know why things are constructed the way they are,
> I don't have much of a clue what strategy would be appropriate for
> resolving this, so let me just explain the problems I found:
>
> 1. The existence of opbx_generator_is_active() probably simply ought
>    to be considered a bug. All current uses seem to introduce race
>    conditions and I can't think of many use cases where this would not
>    be the case. opbx_generator_deactivate() is idempotent anyway, so
>    the use of opbx_generator_is_active() usually is simply superfluous,
>    unless the race condition is the desired behaviour.
>
> 2. opbx_generator_deactivate() would have a potential deadlock if
>    there wasn't the 1-second timeout to trying to synchronize to the
>    completion of the deactivation of the generator, due to the fact
>    that it can be called with the channel lock held (which is what
>    chan_visdn does), which set_format() needs to acquire in the
>    generator thread, too, before it can complete the deactivation.
>
>    Now, on the one hand, simply continuing if the synchronization
>    doesn't happen within a certain amount of time, obviously simply
>    turns this into a race condition, which probably isn't much better
>    than a deadlock. And on the other hand, it's not really acceptable
>    that answering a visdn channel takes a whole second because of
>    this.
>
>    The "obvious" solution of moving the call to the generator's
>    release function to opbx_generator_deactivate() and thus relying on
>    the recursive mutexes doesn't work, as opbx_generator_deactivate()
>    would have to wait for the generator to cease generation. While
>    waiting, it obviously still could be holding the channel lock,
>    which also could be tried to be acquired by the generator thread
>    in opbx_write() before it would be able to notice that it was expected
>    to stop, thus again resulting in a deadlock.
>
> 3. Well, I don't quite understand why the generator is timed by
>    gettimeofday()!? After all, that clock certainly doesn't have any
>    connection with the clock that governs playback, which would be some
>    DAC's crystal or the PSTN's clock, and thus usually will deviate
>    from it. With VoIP, it can be somewhat difficult to get hold of
>    that DAC's clock, of course, but I'd at least have expected the
>    timing to be done as late as possible, and thus in particular
>    independent of the source, rather than at the very source of the
>    signal.
>
>    Also, the timing isn't even done according to gettimeofday() while
>    switching generators.
>
> 4. Finally, one fix. Below you find a patch for a potential
>    memory leak in opbx_generator_activate().
>   
I haven't had time to study the generator code, so I don't really know 
about the kind of internal problems you describe. However, there is one 
area I can certainly comment on - the use of gettimeofday for timing. 
There is a solid reason why gettimeofday needs to be used, as it is the 
only source of timing for RTP and IAX streams. However, the generator 
needs to dynamically choose its timing source, so it uses the hard 
timing from a channel where such a thing exists. I know this is missing, 
and it is unacceptable. Recently in the rxfax and txfax code CtRiX made 
that code dynamically adapt between using or not using a generator, so a 
generator is only used for IP streams. This is hardly a clean solution, 
though, and the generator really needs to handle these cases.

Steve


_______________________________________________
Openpbx-dev mailing list
[email protected]
http://lists.openpbx.org/mailman/listinfo/openpbx-dev

Reply via email to