On 10/01/2013 06:35 PM, Dan Kenigsberg wrote:
> On Tue, Oct 01, 2013 at 02:33:00PM +0100, Lee Yarwood wrote:
>> On 10/01/2013 09:00 AM, Dan Kenigsberg wrote:
>>> It is prefered to post patches to gerrit.ovirt.org.
>>
>> Apologies for jumping in David but I've pushed this here for now :
>>
>> http://gerrit.ovirt.org/19741
> 
> Thanks!
> 
>>
>>> On Tue, Oct 01, 2013 at 01:18:25PM +1000, David Gibson wrote:
>>>> At present, if the super vdsm server dies with an exception inside
>>>> Python's multiprocessing module, then it will not usually produce any
>>>> useful debugging output.
>>>
>>> For our context - when do you notice such supervdsm deaths?
>>> Is it frequent? What is the cause?
>>
>> BZ#1011661 & BZ#1010030 downstream.
> 
> Ok, I can see them, dig into them and find an answer to my question. But
> it's not fair to the wider community of users and partner to cite
> private bugs.
> 
> https://www.berrange.com/posts/2012/06/27/thoughts-on-improving-openstack-git-commit-practicehistory/

Apologies Dan,

I believe David was referring to the public BZ#1011661. I believe that
has been attributed to the following change merged upstream in May :

http://gerrit.ovirt.org/#/c/14998

Hopefully David or Tomas can confirm this.

The private bug BZ#1010030 that I also referenced is slightly different
and covers the following supervdsm tracebacks seen by customers downstream :

~~~
Thread-1029158::ERROR::2013-09-10
02:46:29,250::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain)
Error while collecting domain 977c6c73-5ca2-478c-9ffe-6a72d74a09d4
monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in
_monitorDomain
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 120, in _realProduce
  File "/usr/share/vdsm/storage/misc.py", line 1072, in helper
  File "/usr/share/vdsm/storage/misc.py", line 1057, in __call__
  File "/usr/share/vdsm/storage/sdc.py", line 83, in refreshStorage
  File "/usr/share/vdsm/storage/multipath.py", line 73, in rescan
  File "/usr/share/vdsm/supervdsm.py", line 76, in __call__
  File "/usr/share/vdsm/supervdsm.py", line 67, in <lambda>
  File "<string>", line 2, in forceScsiScan
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in
_callmethod
RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 216, in
serve_client
KeyError: '2baea50'
---------------------------------------------------------------------------
~~~

This BZ is still a WIP and thus any additional debug logging we could
get from the multiprocessing module would help.

Again my apologies for not including these details previously.

Lee
-- 

Lee Yarwood
Senior Software Maintenance Engineer
Red Hat UK Ltd
200 Fowler Avenue, Farnborough Business Park, Farnborough, Hants GU14 7JP

Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson
(US), Charles Peters (US)

GPG fingerprint : A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to