Re: [vdsm] moving the collection of statistics to external process
On 12/05/2012 10:23 PM, ybronhei wrote: As part of an issue that if you push start for 200vms in the same time it takes hours because undefined issue, we thought about moving the collection of statistics outside vdsm. It can help because the stat collection is an internal threads of vdsm that can spend not a bit of a time, I'm not sure if it would help with the issue of starting many vms simultaneously, but it might improve vdsm response. Currently we start thread for each vm and then collecting stats on them in constant intervals, and it must effect vdsm if we have 200 thread like this that can take some time. for example if we have connection errors to storage and we can't receive its response, all the 200 threads can get stuck and lock other threads (gil issue). As far as I know, the design of oop is try to resolve the problem you state. However, I don't understand how GIL can cause this problem? Python should release GIL before executing any I/O involved instruction. I did some tests before and found the other threads can continue to run while one thread get stuck on I/O. I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. The problem with this solution is that if those interval functions needs to communicate with internal parts of vdsm to set values or start internal processes when something has changed, it depends on the stat function.. and I'm not sure that stat function should control Asinternal flows. Today to recognize connectivity error we count on this method, but we can add polling mechanics for those issues (which can raise same problems we are trying to deal with..) I would like to here your ideas and comments.. thanks ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] moving the collection of statistics to external process
On 12/06/2012 11:29 PM, Adam Litke wrote: On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote: 于 2012-12-6 4:51, Itamar Heim 写道: On 12/05/2012 10:33 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote: On 12/05/2012 10:16 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote: On 12/05/2012 08:57 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote: On 12/05/2012 04:42 PM, Adam Litke wrote: I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. Vdsm recently started requiring the MOM package. MOM also performs some host and guest statistics collection as part of the policy framework. I think it would be a really good idea to consolidate all stats collection into MOM. Then, all stats become usable within the policy and by vdsm for its own internal purposes. Today, MOM has one stats collection thread per VM and one thread for the host stats. It has an API for gathering the most recently collected stats which vdsm can use. isn't this what collectd (and its libvirt plugin) or pcp are already doing? Lot's of things collect statistics, but as of right now, we're using MOM and we're not yet using collectd on the host, right? I think we should have a single stats collection service and clients for it. I think mom and vdsm should get their stats from that service, rather than have either beholden to any new stats something needs to collect. How would this work for collecting guest statistics? Would we require collectd to be installed in all guests running under oVirt? my understanding is collectd is installed on the host, and uses collects libvirt plugin to collect guests statistics? Yes, but some statistics can only be collected by making a call to the oVirt guest agent (eg. guest memory statistics). The logical next step would be to write a collectd plugin for ovirt-guest-agent, but vdsm owns the connections to the guest agents and probably does not want to multiplex those connections for many reasons (security being the main one). and some will come from qemu-ga which libvirt will support? maybe a collectd vdsm plugin for the guest agent stats? I am thinking to have the collectd as a stand alone service to collect the statics from both ovirt-guest and qemu-ga. Then collected can export the information to host proc file system in layered architecture. Then mom or other vdsm service can get the information from the proc file system like other OS statics exported in the host. You wouldn't use the host /proc filesystem for this purpose. /proc is an interface between userspace and the kernel. It is not for direct application use. The problem I see with hooking collectd up to ovirt-ga is that vdsm still needs a connection to ovirt-ga for things like shutdown and desktopLogin. Today vdsm, owns the connection to the guest agent and there is not a nice way to multiplex that connection for use by multiple clients simultaneously. Actually, I don't like to collect from statistics from guest agent. Now libvirt can provide the statistics of vcpu, block and network interface. So I think we should reconsider enabling guest memory report in virtio balloon driver. I am not sure if async event is supported in qmp now. How do you think of it? In vdsm and mom, we don't just simply collect statistics, but also need perform appropriate action on it. So probably we still need a output plugin for collectd to to make the data is available to vdsm and mom, and generate an event to vdsm or mom when the data reaches a given threshold. Just an idea. I am not sure how easy to implement it. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] moving the collection of statistics to external process
On 12/07/2012 12:39 PM, Mark Wu wrote: On 12/06/2012 11:29 PM, Adam Litke wrote: On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote: 于 2012-12-6 4:51, Itamar Heim 写道: On 12/05/2012 10:33 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote: On 12/05/2012 10:16 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote: On 12/05/2012 08:57 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote: On 12/05/2012 04:42 PM, Adam Litke wrote: I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. Vdsm recently started requiring the MOM package. MOM also performs some host and guest statistics collection as part of the policy framework. I think it would be a really good idea to consolidate all stats collection into MOM. Then, all stats become usable within the policy and by vdsm for its own internal purposes. Today, MOM has one stats collection thread per VM and one thread for the host stats. It has an API for gathering the most recently collected stats which vdsm can use. isn't this what collectd (and its libvirt plugin) or pcp are already doing? Lot's of things collect statistics, but as of right now, we're using MOM and we're not yet using collectd on the host, right? I think we should have a single stats collection service and clients for it. I think mom and vdsm should get their stats from that service, rather than have either beholden to any new stats something needs to collect. How would this work for collecting guest statistics? Would we require collectd to be installed in all guests running under oVirt? my understanding is collectd is installed on the host, and uses collects libvirt plugin to collect guests statistics? Yes, but some statistics can only be collected by making a call to the oVirt guest agent (eg. guest memory statistics). The logical next step would be to write a collectd plugin for ovirt-guest-agent, but vdsm owns the connections to the guest agents and probably does not want to multiplex those connections for many reasons (security being the main one). and some will come from qemu-ga which libvirt will support? maybe a collectd vdsm plugin for the guest agent stats? I am thinking to have the collectd as a stand alone service to collect the statics from both ovirt-guest and qemu-ga. Then collected can export the information to host proc file system in layered architecture. Then mom or other vdsm service can get the information from the proc file system like other OS statics exported in the host. You wouldn't use the host /proc filesystem for this purpose. /proc is an interface between userspace and the kernel. It is not for direct application use. The problem I see with hooking collectd up to ovirt-ga is that vdsm still needs a connection to ovirt-ga for things like shutdown and desktopLogin. Today vdsm, owns the connection to the guest agent and there is not a nice way to multiplex that connection for use by multiple clients simultaneously. /home/tlv/iheim/workspace Actually, I don't like to collect from statistics from guest agent. Now libvirt can provide the statistics of vcpu, block and network interface. So I think we should reconsider enabling guest memory report in virtio balloon driver. I am not sure if async event is supported in qmp now. How do you think of it? In vdsm and mom, we don't just simply collect statistics, but also need perform appropriate action on it. So probably we still need a output plugin for collectd to to make the data is available to vdsm and mom, and generate an event to vdsm or mom when the data reaches a given threshold. Just an idea. I am not sure how easy to implement it. should be easy for such stats, question is what other items are reported by the current guest agent (say, list of installed applications). ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Friday, December 7, 2012 12:23:15 AM Subject: Re: [vdsm] RFC: New Storage API On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by offloads you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that