Re: [openstack-dev] [Nova] nova-compute deadlock

Qin Zhao Fri, 06 Jun 2014 12:28:28 -0700

Yuriy,

And I think if we use proxy object of multiprocessing, the green thread
will not switch during we call libguestfs.  Is that correct?



On Fri, Jun 6, 2014 at 2:44 AM, Qin Zhao <[email protected]> wrote:

> Hi Yuriy,
>
> I read multiprocessing source code just now.  Now I feel it may not solve
> this problem very easily.  For example, let us assume that we will use the
> proxy object in Manager's process to call libguestfs.  In manager.py, I see
> it needs to create a pipe, before fork the child process. The write end of
> this pipe is required by child process.
>
>
> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1managers_1_1_base_manager.html#a57fe9abe7a3d281286556c4bf3fbf4d5
>
> And in Process._bootstrp(), I think we will need to register a function to
> be called by _run_after_forkers(), in order to closed the fds inherited
> from Nova process.
>
>
> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1process_1_1_process.html#ae594800e7bdef288d9bfbf8b79019d2e
>
> And we also can not close the write end fd created by Manager in
> _run_after_forkers(). One feasible way may be getting that fd from the 5th
> element of _args attribute of Process object, then skip to close this
> fd....  I have not investigate if or not Manager need to use other fds,
> besides this pipe. Personally, I feel such an implementation will be a
> little tricky and risky, because it tightly depends on Manager code. If
> Manager opens other files, or change the argument order, our code will fail
> to run. Am I wrong?  Is there any other safer way?
>
>
> On Thu, Jun 5, 2014 at 11:40 PM, Yuriy Taraday <[email protected]>
> wrote:
>
>> Please take a look at
>> https://docs.python.org/2.7/library/multiprocessing.html#managers -
>> everything is already implemented there.
>> All you need is to start one manager that would serve all your requests
>> to libguestfs. The implementation in stdlib will provide you with all
>> exceptions and return values with minimum code changes on Nova side.
>> Create a new Manager, register an libguestfs "endpoint" in it and call
>> start(). It will spawn a separate process that will speak with calling
>> process over very simple RPC.
>> From the looks of it all you need to do is replace tpool.Proxy calls in
>> VFSGuestFS.setup method to calls to this new Manager.
>>
>>
>> On Thu, Jun 5, 2014 at 7:21 PM, Qin Zhao <[email protected]> wrote:
>>
>>> Hi Yuriy,
>>>
>>> Thanks for reading my bug!  You are right. Python 3.3 or 3.4 should not
>>> have this issue, since they have can secure the file descriptor. Before
>>> OpenStack move to Python 3, we may still need a solution. Calling
>>> libguestfs in a separate process seems to be a way. This way, Nova code can
>>> close those fd by itself, not depending upon CLOEXEC. However, that will be
>>> an expensive solution, since it requires a lot of code change. At least we
>>> need to write code to pass the return value and exception between these two
>>> processes. That will make this solution very complex. Do you agree?
>>>
>>>
>>> On Thu, Jun 5, 2014 at 9:39 PM, Yuriy Taraday <[email protected]>
>>> wrote:
>>>
>>>> This behavior of os.pipe() has changed in Python 3.x so it won't be an
>>>> issue on newer Python (if only it was accessible for us).
>>>>
>>>> From the looks of it you can mitigate the problem by running libguestfs
>>>> requests in a separate process (multiprocessing.managers comes to mind).
>>>> This way the only descriptors child process could theoretically inherit
>>>> would be long-lived pipes to main process although they won't leak because
>>>> they should be marked with CLOEXEC before any libguestfs request is run.
>>>> The other benefit is that this separate process won't be busy opening and
>>>> closing tons of fds so the problem with inheriting will be avoided.
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 2:17 PM, laserjetyang <[email protected]>
>>>> wrote:
>>>>
>>>>>   Will this patch of Python fix your problem? 
>>>>> *http://bugs.python.org/issue7213
>>>>> <http://bugs.python.org/issue7213>*
>>>>>
>>>>>
>>>>> On Wed, Jun 4, 2014 at 10:41 PM, Qin Zhao <[email protected]> wrote:
>>>>>
>>>>>>  Hi Zhu Zhu,
>>>>>>
>>>>>> Thank you for reading my diagram!   I need to clarify that this
>>>>>> problem does not occur during data injection.  Before creating the ISO, 
>>>>>> the
>>>>>> driver code will extend the disk. Libguestfs is invoked in that time 
>>>>>> frame.
>>>>>>
>>>>>> And now I think this problem may occur at any time, if the code use
>>>>>> tpool to invoke libguestfs, and one external commend is executed in 
>>>>>> another
>>>>>> green thread simultaneously.  Please correct me if I am wrong.
>>>>>>
>>>>>> I think one simple solution for this issue is to call libguestfs
>>>>>> routine in greenthread, rather than another native thread. But it will
>>>>>> impact the performance very much. So I do not think that is an acceptable
>>>>>> solution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On Wed, Jun 4, 2014 at 12:00 PM, Zhu Zhu <[email protected]> wrote:
>>>>>>
>>>>>>>   Hi Qin Zhao,
>>>>>>>
>>>>>>> Thanks for raising this issue and analysis. According to the issue
>>>>>>> description and happen scenario(
>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>>>>> ),  if that's the case,  concurrent mutiple KVM spawn instances(*with
>>>>>>> both config drive and data injection enabled*) are triggered, the
>>>>>>> issue can be very likely to happen.
>>>>>>> As in libvirt/driver.py _create_image method, right after iso
>>>>>>> making "cdb.make_drive", the driver will attempt "data injection"
>>>>>>> which will call the libguestfs launch in another thread.
>>>>>>>
>>>>>>> Looks there were also a couple of libguestfs hang issues from Launch
>>>>>>> pad as below. . I am not sure if libguestfs itself can have certain
>>>>>>> mechanism to free/close the fds that inherited from parent process 
>>>>>>> instead
>>>>>>> of require explicitly calling the tear down. Maybe open a defect to
>>>>>>> libguestfs to see what their thoughts?
>>>>>>>
>>>>>>>  https://bugs.launchpad.net/nova/+bug/1286256
>>>>>>> https://bugs.launchpad.net/nova/+bug/1270304
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>  Zhu Zhu
>>>>>>> Best Regards
>>>>>>>
>>>>>>>
>>>>>>>  *From:* Qin Zhao <[email protected]>
>>>>>>> *Date:* 2014-05-31 01:25
>>>>>>>  *To:* OpenStack Development Mailing List (not for usage questions)
>>>>>>> <[email protected]>
>>>>>>> *Subject:* [openstack-dev] [Nova] nova-compute deadlock
>>>>>>>    Hi all,
>>>>>>>
>>>>>>> When I run Icehouse code, I encountered a strange problem. The
>>>>>>> nova-compute service becomes stuck, when I boot instances. I report this
>>>>>>> bug in https://bugs.launchpad.net/nova/+bug/1313477.
>>>>>>>
>>>>>>> After thinking several days, I feel I know its root cause. This bug
>>>>>>> should be a deadlock problem cause by pipe fd leaking.  I draw a 
>>>>>>> diagram to
>>>>>>> illustrate this problem.
>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>>>>>
>>>>>>> However, I have not find a very good solution to prevent this
>>>>>>> deadlock. This problem is related with Python runtime, libguestfs, and
>>>>>>> eventlet. The situation is a little complicated. Is there any expert who
>>>>>>> can help me to look for a solution? I will appreciate for your help!
>>>>>>>
>>>>>>> --
>>>>>>> Qin Zhao
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Qin Zhao
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> [email protected]
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Kind regards, Yuriy.
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> [email protected]
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Qin Zhao
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> [email protected]
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>>
>> Kind regards, Yuriy.
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> [email protected]
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Qin Zhao
>



-- 
Qin Zhao

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] nova-compute deadlock

Reply via email to