Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
- Original Message - On 03/13/2013 11:55 PM, Ayal Baron wrote: ... The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose. I thought you meant engine will be sending the hash of previous requests per VM to vdsm, then vdsm will reply back with vm's removed, vm's added, and the details for vm's that changed (i.e., engine would be doing something like if-modified-since-checksum per vm). benefit is reducing a round trip. but first would need to split to calls of stats (always changing) and slowly/never changing data. If vdms accepts the hash then in your method engine would have to periodically call getVmInfo(hash). What I was suggesting is that getVmStats would return vmInfo hash so that we could avoid calling getVmInfo altogether. The stats *always* change so there is no need for checking if that info has changed. What we could do is avoid the split into 2 verbs by calling getVmStats(hash) and then have getVmStats return everything if the hash has changed or only the stats if it hasn't. This would be the least number of roundtrips and avoid the split. If you don't pass a hash it would return everything so this way it's also fully backward compatible. But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid = 11314 guestIPs = 10.34.60.148 # duplicated info displayIp = 0 displayPort = 5902 displaySecurePort = 5903 username = user@W864GUESTAGENTT clientIp = lastLogin = 1361976900.67 Often Changed: This data is changed quite often however it is not necessary to update this data every 15 seconds. As this is
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
On 17/03/13 15:13, Ayal Baron wrote: - Original Message - On 03/13/2013 11:55 PM, Ayal Baron wrote: ... The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose. I thought you meant engine will be sending the hash of previous requests per VM to vdsm, then vdsm will reply back with vm's removed, vm's added, and the details for vm's that changed (i.e., engine would be doing something like if-modified-since-checksum per vm). benefit is reducing a round trip. but first would need to split to calls of stats (always changing) and slowly/never changing data. If vdms accepts the hash then in your method engine would have to periodically call getVmInfo(hash). What I was suggesting is that getVmStats would return vmInfo hash so that we could avoid calling getVmInfo altogether. The stats *always* change so there is no need for checking if that info has changed. What we could do is avoid the split into 2 verbs by calling getVmStats(hash) and then have getVmStats return everything if the hash has changed or only the stats if it hasn't. This would be the least number of roundtrips and avoid the split. If you don't pass a hash it would return everything so this way it's also fully backward compatible. For the 'static' data, why is there a need for a hash? If VDSM sends in each update a timestamp, can't RHEVM just use if-modified-since with the last timestamp it got from VDSM? Is it cheaper for VDSM to calculate the hash, than update the timestamp per change in any of the fields? It doesn't really need to update the timestamp per change, only for the first change since last update sent actually (so 'dirty' flag in a way, to signify data that RHEVM hasn't seen yet). Y. But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid
[vdsm] Per device custom properties
Hi all, Right now we have the ability to define VM-wide properties that may be used by hooks. It is time we have the same functionality on a device basis: http://www.ovirt.org/Features/Device_Custom_Properties For example: If the VM has 2 disks called disk1 and disk2, and two NICs called nic1 and nic2, and the admin (From the engine) added a custom property qos: 0.5 for nic1 and a custom property defrag: None for disk2. When the VM is started we'll run a hook for nic1 with its XML and qos: 0.5 added as an environment variable, and a hook for disk2 with its XML and defrag: None. When a device is hot plugged and it has custom properties we'll run that hook as well. Implementation-wise, hot plug/unplug for disks and NICs is dead simple - vmCreate is more problematic: If the user set a custom property called 'qos: 0.8' for nic3, I'd want it exposed as an environment variable called 'qos' for hot plug nic hooks, but for vmCreate I'd like to prefix the nic's alias. However, when vmCreate is called we don't have the aliases for NICs and disks. The proposed solution is to create a new hook point called something like: 'before_device_creation' that will be called before vmCreate. We'll then call that hook specifically for devices that contains custom properties, as described in the second paragraph of this mail. I would love to hear smarter ideas before I move forward. Thanks! ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
- Original Message - From: Ayal Baron aba...@redhat.com To: Itamar Heim ih...@redhat.com Cc: engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Sunday, March 17, 2013 3:13:09 PM Subject: Re: [Engine-devel] [vdsm] Proposal VDSM = Engine Data Statistics Retrieval Optimization - Original Message - On 03/13/2013 11:55 PM, Ayal Baron wrote: ... The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose. I thought you meant engine will be sending the hash of previous requests per VM to vdsm, then vdsm will reply back with vm's removed, vm's added, and the details for vm's that changed (i.e., engine would be doing something like if-modified-since-checksum per vm). benefit is reducing a round trip. but first would need to split to calls of stats (always changing) and slowly/never changing data. If vdms accepts the hash then in your method engine would have to periodically call getVmInfo(hash). What I was suggesting is that getVmStats would return vmInfo hash so that we could avoid calling getVmInfo altogether. The stats *always* change so there is no need for checking if that info has changed. What we could do is avoid the split into 2 verbs by calling getVmStats(hash) and then have getVmStats return everything if the hash has changed or only the stats if it hasn't. This would be the least number of roundtrips and avoid the split. If you don't pass a hash it would return everything so this way it's also fully backward compatible. Actually, I assume we can pass hash 0 (to have vdsm return everything). I assume that the chances for md5 on real data (i.e - real data that is known to engine) to be 0 are very slim. But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw':
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
- Original Message - On 17/03/13 15:13, Ayal Baron wrote: - Original Message - On 03/13/2013 11:55 PM, Ayal Baron wrote: ... The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose. I thought you meant engine will be sending the hash of previous requests per VM to vdsm, then vdsm will reply back with vm's removed, vm's added, and the details for vm's that changed (i.e., engine would be doing something like if-modified-since-checksum per vm). benefit is reducing a round trip. but first would need to split to calls of stats (always changing) and slowly/never changing data. If vdms accepts the hash then in your method engine would have to periodically call getVmInfo(hash). What I was suggesting is that getVmStats would return vmInfo hash so that we could avoid calling getVmInfo altogether. The stats *always* change so there is no need for checking if that info has changed. What we could do is avoid the split into 2 verbs by calling getVmStats(hash) and then have getVmStats return everything if the hash has changed or only the stats if it hasn't. This would be the least number of roundtrips and avoid the split. If you don't pass a hash it would return everything so this way it's also fully backward compatible. For the 'static' data, why is there a need for a hash? If VDSM sends in each update a timestamp, can't RHEVM just use if-modified-since with the last timestamp it got from VDSM? Is it cheaper for VDSM to calculate the hash, than update the timestamp per change in any of the fields? It doesn't really need to update the timestamp per change, only for the first change since last update sent actually (so 'dirty' flag in a way, to signify data that RHEVM hasn't seen yet). Y. As Saggi mentioned: VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. The content doesn't matter, what matters is that it has changed. timestamp assumes that vdsm will track changes and send only delta. Although possible this would be an overkill (for every value in the dict you'd have to hold a timestamp of last change and send only those which have changed since the timestamp which was passed by the user). Either way, I don't care what the 'hash' is, the point was that there is a simple way to keep a single API call, keep BC and toggle returning all data or just statistics (data that changes frequently) since last time user checked while minimizing API calls. But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
- Original Message - - Original Message - From: Ayal Baron aba...@redhat.com To: Itamar Heim ih...@redhat.com Cc: engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Sunday, March 17, 2013 3:13:09 PM Subject: Re: [Engine-devel] [vdsm] Proposal VDSM = Engine Data Statistics RetrievalOptimization - Original Message - On 03/13/2013 11:55 PM, Ayal Baron wrote: ... The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose. I thought you meant engine will be sending the hash of previous requests per VM to vdsm, then vdsm will reply back with vm's removed, vm's added, and the details for vm's that changed (i.e., engine would be doing something like if-modified-since-checksum per vm). benefit is reducing a round trip. but first would need to split to calls of stats (always changing) and slowly/never changing data. If vdms accepts the hash then in your method engine would have to periodically call getVmInfo(hash). What I was suggesting is that getVmStats would return vmInfo hash so that we could avoid calling getVmInfo altogether. The stats *always* change so there is no need for checking if that info has changed. What we could do is avoid the split into 2 verbs by calling getVmStats(hash) and then have getVmStats return everything if the hash has changed or only the stats if it hasn't. This would be the least number of roundtrips and avoid the split. If you don't pass a hash it would return everything so this way it's also fully backward compatible. Actually, I assume we can pass hash 0 (to have vdsm return everything). I assume that the chances for md5 on real data (i.e - real data that is known to engine) to be 0 are very slim. We'd need to support hash=None to keep backward compatibility, plus there are no assumptions this way on hash algorithm so why bother with hash=0? But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running