Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock

liu ping fan Thu, 20 Sep 2012 01:12:03 -0700

On Wed, Sep 19, 2012 at 5:34 PM, Jan Kiszka <jan.kis...@siemens.com> wrote:
> On 2012-09-19 11:00, liu ping fan wrote:
>> On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <a...@redhat.com> wrote:
>>> On 09/19/2012 06:02 AM, liu ping fan wrote:
>>>> Currently, cpu_physical_memory_rw() can be used directly or indirectly
>>>> by mmio-dispatcher to access other devices' memory region. This can
>>>> cause some problem when adopting device's private lock.
>>>>
>>>> Back ground refer to:
>>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html
>>>> For lazy, just refer to:
>>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html
>>>>
>>>>
>>>> --1st. the recursive lock of biglock.
>>>> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have
>>>> the following (section of the whole call chain, and with
>>>> private_lockA):
>>>>       lockA-mmio-dispatcher   --> hold biglock -- >c_p_m_rw() --- >
>>>> Before c_p_m_rw(), we drop private_lockA to anti the possibly of
>>>> deadlock.  But we can not anti the nested of this chain or calling to
>>>> another lockB-mmio-dispatcher. So we can not avoid the possibility of
>>>> nested lock of biglock.  And another important factor is that we break
>>>> the lock sequence: private_lock-->biglock.
>>>> All of these require us to push biglock's holding into c_p_m_rw(), the
>>>> wrapper can not give help.
>>>
>>> I agree that this is unavoidable.
>>>
>>>>
>>>> --2nd. c_p_m_rw(), sync or async?
>>>>
>>>> IF we convert all of the device to be protected by refcount, then we can 
>>>> have
>>>> //no big lock
>>>>  c_p_m_rw()
>>>> {
>>>>    devB->ref++;
>>>>    {
>>>> --------------------------------------->pushed onto another thread.
>>>>    lock_privatelock
>>>>    mr->ops->write();
>>>>    unlock_privatelock
>>>>    }
>>>>    wait_for_completion();
>>>>    devB->ref--;
>>>> }
>>>> This model can help c_p_m_rw() present as a SYNC API.  But currently,
>>>> we mix biglock and private lock together, and wait_for_completion()
>>>> maybe block the release of big lock, which finally causes deadlock. So
>>>> we can not simply rely on this model.
>>>> Instead, we need to classify the calling scene into three cases:
>>>>   case1. lockA--dispatcher ---> lockB-dispatcher   //can use
>>>> async+completion model
>>>>   case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can
>>>> cause the nested lock of biglock
>>>>   case3. biglock-dispacher ---> lockB-dispatcher  // async to avoid
>>>> the lock sequence problem, (as to completion, it need to be placed
>>>> outside the top level biglock, and it is hard to do so. Suggest to
>>>> change to case 1. Or at present, just leave it async)
>>>>
>>>> This new model will require the biglock can be nested.
>>>
>>> I think changing to an async model is too complicated.  It's difficult
>>> enough already.  Isn't dropping private locks + recursive big locks
>>> sufficient?
>>>
>> I think that "dropping private locks + recursive big locks" just cover
>> case 2. And most of the important, it dont describe case3 which break
>> the rule of lock sequence "private-lock --> biglock". Scene:
>> devA_lock-->(devX_with-biglock--->devB_lock).
>> I just want to classify and post these cases to discuss. Maybe we can
>> achieve without async.
>
> What about the following:
>
> What we really need to support in practice is MMIO access triggers RAM
> access of device model. Scenarios where a device access triggers another
> MMIO access could likely just be rejected without causing troubles.
>
> So, when we dispatch a request to a device, we mark that the current
> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does
> _not_ target RAM, ie. is another, nested MMIO request - independent of
> its destination. How much of the known issues would this solve? And what
> would remain open?
>
This make things more easy and realistic. Will dig into code to see
whether there is exception or not.


Thanks and regards,
pingfan

> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock

Reply via email to