On Dec 7, 2012, at 10:39 , Mark Wu <wu...@linux.vnet.ibm.com> wrote:

> On 12/05/2012 10:23 PM, ybronhei wrote:
>> As part of an issue that if you push start for 200vms in the same time it 
>> takes hours because undefined issue, we thought about moving the collection 
>> of statistics outside vdsm.
>> 
>> It can help because the stat collection is an internal threads of vdsm that 
>> can spend not a bit of a time, I'm not sure if it would help with the issue 
>> of starting many vms simultaneously, but it might improve vdsm response.
>> 
>> Currently we start thread for each vm and then collecting stats on them in 
>> constant intervals, and it must effect vdsm if we have 200 thread like this 
>> that can take some time. for example if we have connection errors to storage 
>> and we can't receive its response, all the 200 threads can get stuck and 
>> lock other threads (gil issue).
> As far as I know, the design of oop is try to resolve the problem you state. 
> However,  I don't understand how GIL can cause this problem?  Python should 
> release GIL before executing any I/O involved
> instruction.  I did some tests before and found the other threads can 
> continue to run while one thread get stuck on I/O.
AFAIU not stuck, but the contention is so high it slows everything down 
significantly. More importantly the immediate polling for statistics right 
after a libvirt createVM call slows down the whole system. The external process 
solution would help so we can delay statistics collection by making it async to 
vm creation. Something like e.g. scan every 5 secs for list of VMs and update 
the vms-to-gather-stats-from list


>> 
>> I wanted to know what do you think about it and if you have better solution 
>> to avoid initiate so many threads? And if splitting vdsm is a good idea here?
>> In first look, my opinion is that it can help and would be nice to have 
>> vmStatisticService that runs and writes to separate log the vms status.
>> 
>> The problem with this solution is that if  those interval functions needs to 
>> communicate with internal parts of vdsm to set values or start internal 
>> processes when something has changed, it depends on the stat function.. and 
>> I'm not sure that stat function should control Asinternal flows.
>> Today to recognize connectivity error we count on this method, but we can 
>> add polling mechanics for those issues (which can raise same problems we are 
>> trying to deal with..)
>> 
>> I would like to here your ideas and comments.. thanks
>> 
> 

_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to