On 04/22/2013 10:39 PM, Ayal Baron wrote:
I haven't started writing patches for this particular thing yet, however I want to, but before I want to know that the approach will be acceptable and so I won't have to throw away work because it's not accepted by the project that way.----- Original Message -----I have left this for a while without continuing because I had to focus on other things. However this is still in progress :-)Are you writing patches? (if so, what solution are you pursuing)
My ultimate goal is to reduce the overall traffic and the amount of data the engine has to parse on a per Host basis.
This can be achieved without having push notifications working already.Imagine that oVirt engine currently has to parse for 1000 VMs running, ~16MiB of XML data and that every 15 seconds. And that's just the result from getAllVmStats. Additionally it's polling every 2-3 seconds 'list' to retrieve the VmIds and statuses.
Now I have split the values which the engine would return in getVmStats into groups based on how likely the are to change AND their size. getAllVmRuntimeStats() for example, returns the hashes to the different parts where it makes sense, however it won't give you the devices for example. Because the engine still has to poll that every 5 minutes or so to retrieve stats for the the data warehouse.
I am personally for having push notifications properly implemented and working however the base of this idea is reducing parser and validation work and will help the engine backend to scale better. Especially it should theoretically reduce the load on the database as well, at least theoretically (I have no numbers for this one ;-))
On 03/13/2013 10:55 PM, Ayal Baron wrote:----- Original Message ---------- Original Message -----From: "Ayal Baron" <aba...@redhat.com> To: "Saggi Mizrahi" <smizr...@redhat.com> Cc: engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org, "Vinzenz Feenstra" <vfeen...@redhat.com> Sent: Wednesday, March 13, 2013 5:39:24 PM Subject: Re: [vdsm] [Engine-devel] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization ----- Original Message -----I am completely against this. It make the return value differ according to input which is a big no no when talking about type safe APIs. The only reason we have this problem is because there is this thing against making multiple calls.Which is totally contra productive because multiple calls, if properly split up, will actually lead to less data sent for frequent needed data calls. And the others shall be triggered when necessary.Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals.+1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the "static" data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would).+1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number.Of course vdsm does the hash, otherwise you'd need to pass all the data to engine which would beat the purpose.We need the hash if we can't have dynamic content. Generation numbers aren't really helpful as every call aggregates the statistics data newly, at the moment at least.But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill.You'd still need to compare versions of the data in vdsm and send only if it changed. If you don't persist what was received last then potentially you could have a monday morning effect where upon on system startup you'd be sending everything. So I still think you'd want to have the hash.We do a hash already on the XML and include it in the getStats response. Hashes should show enough difference. Now to the non-dynamic responses and 'type-safe' API: If we would go for non dynamic responses we would need for sure 5 new API calls to achieve some gain on the amount of data sent. *getAllVmRuntimeStats() "returns a map of vmId/data pairs for all vms"* # All the time changing data which is needed by the oVirt Engine, or so often changing that it does not make sense # to place it anywhere else { VmId: { cpuSys --> Could be potentially summarized cpuUser -/ memUsage elapsedTime, status statsAge hashes = { conf, # Hased information of the XML (This one is called "hash" in getStats()) info, # Hashed information of semi static items statusHash: # Hashed information of items with are likely to change however not that often guestDetails: # Hashed value of the guest details (applicationList, network information) } } **getVmStatuses([vmId1, vmId2, ...])*****"Returns a vmId/data pair for each vm requested"** *# This data does not change that often and can be retrieved on demand once the hash changes return { vmId: { timeOffset, monitorResponse clientIp, lastLogin, username, session, guestIPs, } } *getAllVmDeviceStatistics():**"Returns a vmId/data pair for all vms"* # This data has to be requested all the time however in lower intervals (e.g. every 5 minutes) # And is usually needed for all the VMs anyway return { vmId: { network, disksUsage, # Might be improved by summarizing? disks, balloonInfo, memoryStats } } *getVmInfo([vmId1, vmId2, ...]) "Returns a vmId/data pair for each vm requested" * # Basically this should be almost constant, except if there have been changes like migrations, pausing, errors etc return { vmId: { acpiEnable, vmType, guestName, guestOS, kvmEnable, pauseCode, displayIp, displayPort, displaySecurePort, pid, } } *getVmGuestDetails*([vmId1, vmId2, ...]) # Data which changes seldom and these changes can be reflected in the hash when this needs to be requested # This data is really only necessary when it really has been changed or needs to be refreshed for whatever reason. return { vmId: { appsList, netIfaces, } }----- Original Message -----From: "Vinzenz Feenstra" <vfeen...@redhat.com> To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM <=> Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid = 11314 guestIPs = 10.34.60.148 # duplicated info displayIp = 0 displayPort = 5902 displaySecurePort = 5903 username = user@W864GUESTAGENTT clientIp = lastLogin = 1361976900.67 Often Changed: This data is changed quite often however it is not necessary to update this data every 15 seconds. As this is cumulative data and reflects the current status, and it does not need to be snapshotted every 15 seconds to retrieve statistics. The data can be retrieved in much more generous time slices. (e.g. Every 5 minutes) network = {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name': 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total': '64055406592', 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total': '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset = 14422 elapsedTime = 68591 hash = 2335461227228498964 statsAge = 0.09 # unused Often Changed but unused This data does not seem to be used in the engine at all. It is not even used in the data warehouse. memoryStats = {'swap_out': '0', 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0', 'pageflt': '0', 'mem_total': '2096736', 'mem_unused': '1466884'} balloonInfo = {'balloon_max': 2097152, 'balloon_cur': 2097152} disks = {'vda': {'readLatency': '0', 'apparentsize': '64424509440', 'writeLatency': '1754496', 'imageID': '28abb923-7b89-4638-84f8-1700f0b76482', 'flushLatency': '156549', 'readRate': '0.00', 'truesize': '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}} Very frequent uppdates needed by webadmin portal: This data is mostly needed for the webadmin portal and might be required to be updated quite often. An exception here is the statsAge field, which seems to be unused by the Engine. This data could be requested every 15 seconds to keep things as they are now. cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed Solution for VDSM & Engine: We will introduce new optional parameters to getVmStats, getAllVmStats and list to allow a finer grained specification of data which should be included. Parameter: statsType = <string> (getVmStats, getAllVmStats only) Allowed values: * full (default to keep backwards compatibility) * app-list (Just send the application list) * rare (include everything from rarely changed to very frequent) * often (include everything from often changed to very frequent) * frequent (only send the very frequently changed items) Parameter: clientId = <string> The client id is specified by the client and should be unique however constantly used. Parameter: diff = <boolean> In combination with the clientId VDSM will send only differences to the previous request from the named clientId. (if diff=true) Additional Change: Besides the introduction of the new parameters for list, getVmStats and getAllVmStats it might make sense to include a hash for the appList into the rarely changed section of the response which would allow to identify changes and avoid having to sent the complete appList every so often and only if the hash known to the client is outdated. Note: The appList (Application List) reported by the guest agent could be fully implemented on request only, as long as the guest agent installed supports this. As there seems to be a request to have the complete list of installed applications on all guests this data could be quite extensive and a huge list. On the other hand this data is only rarely visible and therefore it should not be requested all the time and only on demand. Improvement of the Guest Agent: As part of the proposed solution it is necessary to improve the guest agent as well. For the full application list there should be implemented a caching system which will be fully reactive and should not poll the application list for example all the time. The guest can create a prepared data file containing all data in the JSON format (as used for the communication with VDSM via VIO) and just have to read that file from disk and directly sends it to VDSM. However it is quite possible that this list is to big and it might have to be chunked into pieces. (Multiple messages, which would have to be supported by VDSM then as well) The solution for this is to make VDSM request this data and it will retrieve the data necessary on request only. -- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Engine-devel mailing list engine-de...@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel_______________________________________________ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel-- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
-- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R & D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel