On 2016-03-01 11:25, Ahmed S. Darwish wrote:
Hi :-)

On Tue, Feb 23, 2016 at 11:45:41AM +0200, Tanu Kaskinen wrote:
On Tue, 2016-02-23 at 11:19 +0200, Tanu Kaskinen wrote:
On Tue, 2016-02-23 at 07:55 +0200, Ahmed S. Danish wrote:
Hello everyone,

As you probably know by now, the second patch in the memfd series
was created to transform the global, shared by all clients,
srbchannel mempool to a "per-client" one. [1]

Reason of this transition is to avoid data leaks between clients.
In a follow-up patch series, it's even hoped that the global core
mempool will also be transformed to the per-client model. [2]


==> Current problem:
--------------------

By injecting faults at the system-call level, it was discovered
that current design of per-client mempools leads to a reproducible
memory fault on certain error-recovery paths.

The fault is caused by a design issue, rather than a small coding
error.


==> Why are you dedicating a full mail to this?
-----------------------------------------------

The problematic issue discovered implies that a bigger design
change is required; your input is very much needed :-)


==> What was the current design?
--------------------------------

To do the per-client mempool transformation, a simple decision
was taken: let 'pa_native_connection' own the per-client mempool
in question.

Why? Because when a client connects, a 'native connection' object
is created for it by default. Inside this object is the pstream
pipe and other per client resources. So this seemed like the best
place for a per-client mempool:

   /* protocol-native.c - Called for each client connection */
   void pa_native_protocol_connect(...) {
       pa_native_connection *c;

       ...
       c = pa_msgobject_new(pa_native_connection);
       c->pstream = pa_pstream_new(...);
       c->mempool = pa_mempool_new(...);    // per-client mempool
       ...
   }

And 'naturally', the pool should be deallocated when the
connection closes:

   /* protocol-native.c */
   native_connection_free(pa_object *o) {
       ...
       pa_pdispatch_unref(c->pdispatch);
       pa_pstream_unref(c->pstream);

       // Srbchannels allocates from this per-client pool,
       // so free it only after the pstream is freed
       pa_mempool_free(c->mempool);
       ...
   }

All looked fine and dandy ..


==> And where is the problem exactly?
-------------------------------------

By injecting failures in sendmsg(2), thus reaching up to the
iochannel and pstream layers, the following leak is shown:

   E: Memory pool destroyed but not all mem blocks freed! 1 remain

and a segfault is then produced due to a use-after-free operation
on the leaked memblock.

The sequence of failure, bottom-up, is as follows:

   -> sendmsg(2) failes -EACCES
     -> pa_iochannel_write_with_creds() return failure
       -> pstream do_write() fails
         -> do_pstream_read_write() fails, jumps to error recovery

And the error recovery goes as this:

   -> pstream die callback (p->callback) is called
   -> That's protocol-native.c pstream_die_callback()
     -> native_connection_unlink    // Stops srbchannel, etc.
       -> pa_pstream_unlink         // Reset all pstream callbacks
       -> native_connection_free    // Per-client mempool _freed!_
   -> pa_pstream_unlink             // Second time, does nothing..
   -> pa_pstram_free
     -> pa_queue_free(p->send_queue, item_free)
       -> 1. PACKET item found .. safely removed
       -> 2. MEMBLOCK item found .. from freed srbchannel mempool
             BOOM!!! segfault

SUMMARY : As seen above, a per-client mempool's lifetime must be
a superset of the pstream's lifetime. Putting the per-client pool
in the native_connection object provided only an _illusion_ of
such a superset. [*]

During error recovery, stale memblocks remain in pstrem's send
queue long after __all the per-client objects__ has been removed.

[*] Check native_connection_free() under "What was the current
     design?" for why this illusion was maintained.


==> Suggested solutions?
------------------------

The situation a is bit tricky.

Where can we put the per-client pool while insuring a correct
lifecycle – especially during error recovery?

The safest option, it seems, is to let the per-client pool be
owned by the pstream object, thus the pool can be freed at the
very end of pstream_free() itself. Is such a solution acceptable?

My first reaction is that why is the pstream object reference counted?


Seems this was just a regular convention rather than a conscious
design decision. This is evidenced by the fact of having only
__two__ pa_pstream_ref() calls in the entire tree. At pstream.c
do_pstream_read_write() and in the same file at srb_callback().
In both places they're just a ref/unref couple done at local
context.

I recently added the one in srb_callback for a good reason:

commit f277f2c5094fb32c5d879923960eb807b3b1c535
Author: David Henningsson <[email protected]>
Date:   Fri Oct 16 22:12:32 2015 +0200

    pstream: Fix use-after-free in srb_callback

...please make sure this bug does not reappear if you change things around :-)

--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Reply via email to