Re: [Openstack] [ceilometer] meter data volume
于 2012年10月31日 21:56, Julien Danjou 写道: On Wed, Oct 31 2012, 吴亚伟 wrote: 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? Why do you think it's 1238.9 seconds? Well, to be honest, I do really though it's an uptime... 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? Not at all. It means the CPU time consumed is reset to 0, but that's not an issue in itself, the API should be capable to deal with that if you ask for the total usage. Is the API capable to deal with that at present? If not, when? 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? Don't understand what you mean, but the CPU counter is a cumulative one, and asking for its sum is a non-sense. You want to ask for (max - min) to get the used value, something which is not in the API yet. Is the Web API in the document going to be updated recently? Thanks --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
Hi Eoghan, Thanks for your reply. As we can see from the document: - Three type of meters are defined in ceilometer: TypeDefinition Cumulative Increasing over time (instance hours) Gauge Discrete items (floating IPs, image uploads) and fluctuating values (disk I/O) Delta Changing over time (bandwidth) Cumulative type is apparent, while even with descriptions gauge and delta type confuse me. Could you explain them through examples or by sharing an use case? Thanks --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. Hi Yawei Wu, The root of the confusion is the fact the cpu meter is reporting the cumlative cpu_time stat from libvirt. This libvirt counter is reset when the associated qemu process is restarted (an artifact of how cpuacct works). So when you stop/start or suspend/resume, a fresh qemu process is sparked up, then the cumulative time is reset. Thanks for bringing this up, as it has implications as to how we meter CPU time and utilization[1]. We may need to start metering the delta between CPU times on subsequent polling cycles, instead of using a cumulative meter (dealing with the edge case where the instance has been restarted within a polling period). Cheers, Eoghan [1] https://review.openstack.org/14921 I am still testing ceilometer now. I am confused about the meter volume in the mongodb. Let's talk about cpu usage. After I create and boot a vm named vm_1, meter data record about cpu usage will be inserted into db in cycle(default 10 minutes). For example,the 'counter_volume' of the first record is '5206000',and the second one is '12389000'. 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu usage record increase all the time ? Just like '8 minutes' - '18 minutes' - '28 minutes' ? 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? 5) If I want to know how long has vm_1's cpu been used yesterday, how can I do ? It seems like that I have too many questions.. Thank you very much ! --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, Eoghan Glynn wrote: Yep the sum of local maxima is not lossy as long as the requested duration completely encapsulates the compute agent outage (and the instance doesn't restart during the outage). Actually, if there's one restart, it still _can_ be safe in certain cirumstances such as in a case like: Time | Value 0| 1000 1| 3000 (agent down) 2| 0(agent down) 3| 80 4| 100 If in this particular case, with the case where your agent was down at t1 and t2. The API will detect the counter reseted while the agent was down. With cumulative model, the loss is again less than the computed delta model. OTOH, both models fails to get some data with a case like: Time | Value 0| 1000 1| 3000 (agent down) 2| 0(agent down) 3| 8000 4| 1 However I was more thinking of the scenario where the duration requested via the API is say t1..t4 in your example above. In any case, do we need a new measurement type, in addition to the existing CUMULATIVE type, that captures the non-monotonic nature of the measure and alerts the API that special handling is required to compute say max-min? Something like TRANSIENT_CUMULATIVE, if that's not too much of a mouthful. We discussed it already with Doug, and came to conclusion that we didn't, because the monotonic case is just a special case of the non monotonic one. So applying the computing method to the non-monotonic case will solve all problem. -- Julien Danjou -- Free Software hacker freelance -- http://julien.danjou.info pgpFZzJXgNfiW.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Thu, Nov 01 2012, 吴亚伟 wrote: Is the API capable to deal with that at present? If not, when? Not yet. When someone will write with the code! Is the Web API in the document going to be updated recently? We hope so! -- Julien Danjou -- Free Software hacker freelance -- http://julien.danjou.info pgpRKrwXhvbAn.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Thu, Nov 01 2012, 吴亚伟 wrote: Cumulative type is apparent, while even with descriptions gauge and delta type confuse me. Could you explain them through examples or by sharing an use case? Gauge is an absolute value, like a temperature or the number of people in a room. Delta is a counter where each value is the difference between the current and the previous value. Each value represents how many things were consumed since last time a value has been sent. It's always compared to a counter that resets to 0 once you read it. -- Julien Danjou // Free Software hacker freelance // http://julien.danjou.info pgpsqx8oaQ2Qf.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31, 2012 at 1:39 PM, Julien Danjou jul...@danjou.info wrote: On Wed, Oct 31 2012, Eoghan Glynn wrote: Would we have also have some 'misses' with the cumulative approach when the ceilometer agent was down? No, unless the counter resets several times while your agent is down. But delta has the same issue. If I understood the (\Sigma local maxima)-first idea correctly, the usage up to the first polling cycle would always be discounted from any duration. No, because if you have: Time | Value 0| 10 1| 30 2| 50 3| 80 4| 100 If your delta-pollster is down at 1 and 2, you restart at 3, therefore at 4 you'll send 20 as usage (100 minus 80). So you miss the delta between 10 (time 0) and 80 (time 3) (therefore 70 for free!). If you send right away 80 at time 3 when restarting, the API will be able to guess that between 0 and 3 the value went from 10 to 80. With delta approach, the API cannot guess that. Sure it can, you just need to move where the caching is done. Using a local cache to maintain the previous time a value was published you would know at time 3 that the last published value was 10, and so send 70. So the total will be correct. Doug ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
if you have: Time | Value 0 | 10 1 | 30 2 | 50 3 | 80 4 | 100 If your delta-pollster is down at 1 and 2, you restart at 3, therefore at 4 you'll send 20 as usage (100 minus 80). So you miss the delta between 10 (time 0) and 80 (time 3) (therefore 70 for free!). If you send right away 80 at time 3 when restarting, the API will be able to guess that between 0 and 3 the value went from 10 to 80. With delta approach, the API cannot guess that. Sure it can, you just need to move where the caching is done. Using a local cache to maintain the previous time a value was published you would know at time 3 that the last published value was 10, and so send 70. So the total will be correct. Good point, previously IIUC there was an implicit assumption that any prev time caching would be done in-memory, hence lost across process restarts. But as you point out, these data could be persisted locally by the compute agent. What would be the best way to achieve this? A small sqlite DB per-agent, or even simpler just a pickled dict? The latter would avoid the complexity of DB versioning and migration. Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Thu, Nov 1, 2012 at 1:31 PM, Eoghan Glynn egl...@redhat.com wrote: if you have: Time | Value 0 | 10 1 | 30 2 | 50 3 | 80 4 | 100 If your delta-pollster is down at 1 and 2, you restart at 3, therefore at 4 you'll send 20 as usage (100 minus 80). So you miss the delta between 10 (time 0) and 80 (time 3) (therefore 70 for free!). If you send right away 80 at time 3 when restarting, the API will be able to guess that between 0 and 3 the value went from 10 to 80. With delta approach, the API cannot guess that. Sure it can, you just need to move where the caching is done. Using a local cache to maintain the previous time a value was published you would know at time 3 that the last published value was 10, and so send 70. So the total will be correct. Good point, previously IIUC there was an implicit assumption that any prev time caching would be done in-memory, hence lost across process restarts. But as you point out, these data could be persisted locally by the compute agent. What would be the best way to achieve this? A small sqlite DB per-agent, or even simpler just a pickled dict? The latter would avoid the complexity of DB versioning and migration. I discussed this issue at the summit with James Penick of Yahoo, and he showed me some code in their agent that is using a sqllite db. We will want to build a nice API so pollsters can use the cache without having to worry about how it is implemented, which would let us deal with any versioning issues in a central spot. Doug ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Thu, Nov 01 2012, Eoghan Glynn wrote: What would be the best way to achieve this? A small sqlite DB per-agent, or even simpler just a pickled dict? The latter would avoid the complexity of DB versioning and migration. At the risk of repeating myself, can I stress again how much we don't need to transform cumulative into delta, and certainly not in the pollster/agents/notifications code? -- Julien Danjou /* Free Software hacker freelance http://julien.danjou.info */ pgpUMOnj02Wdh.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Thu, Nov 01 2012, Eoghan Glynn wrote: Well local persistence of the prev times would still be of use I think for the in-pollster CPU util % calculation (as discussed here[1]). Allright, so that may be needed, but not in the pollster. It'll be needed in the CW publisher that will compute that, because CW don't offer any other mean (I imagine) to handle cumulative value like Ceilometer does. :) (But I know that currently we don't have multi-publisher, and that's why it's done that way in #14921 :) -- Julien Danjou // Free Software hacker freelance // http://julien.danjou.info pgpGXlGjlMQCY.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] [ceilometer] meter data volume
Hi Julien, Sorry to bother you. I am still testing ceilometer now. I am confused about the meter volume in the mongodb. Let's talk about cpu usage. After I create and boot a vm named vm_1, meter data record about cpu usage will be inserted into db in cycle(default 10 minutes). For example,the 'counter_volume' of the first record is '5206000',and the second one is '12389000'. 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu usage record increase all the time ? Just like '8 minutes' - '18 minutes' - '28 minutes' ? 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? 5) If I want to know how long has vm_1's cpu been used yesterday, how can I do ? It seems like that I have too many questions.. Thank you very much ! --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, 吴亚伟 wrote: 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? Why do you think it's 1238.9 seconds? 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu usage record increase all the time ? Just like '8 minutes' - '18 minutes' - '28 minutes' ? It's a CPU time, not an uptime. http://en.wikipedia.org/wiki/CPU_time 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? Not at all. It means the CPU time consumed is reset to 0, but that's not an issue in itself, the API should be capable to deal with that if you ask for the total usage. 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? Don't understand what you mean, but the CPU counter is a cumulative one, and asking for its sum is a non-sense. You want to ask for (max - min) to get the used value, something which is not in the API yet. 5) If I want to know how long has vm_1's cpu been used yesterday, how can I do ? Just like I wrote above. :) -- Julien Danjou ;; Free Software hacker freelance ;; http://julien.danjou.info pgpLlWTwzAZGK.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
Hi Yawei Wu, The root of the confusion is the fact the cpu meter is reporting the cumlative cpu_time stat from libvirt. This libvirt counter is reset when the associated qemu process is restarted (an artifact of how cpuacct works). So when you stop/start or suspend/resume, a fresh qemu process is sparked up, then the cumulative time is reset. Thanks for bringing this up, as it has implications as to how we meter CPU time and utilization[1]. We may need to start metering the delta between CPU times on subsequent polling cycles, instead of using a cumulative meter (dealing with the edge case where the instance has been restarted within a polling period). Cheers, Eoghan [1] https://review.openstack.org/14921 I am still testing ceilometer now. I am confused about the meter volume in the mongodb. Let's talk about cpu usage. After I create and boot a vm named vm_1, meter data record about cpu usage will be inserted into db in cycle(default 10 minutes). For example,the 'counter_volume' of the first record is '5206000',and the second one is '12389000'. 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu usage record increase all the time ? Just like '8 minutes' - '18 minutes' - '28 minutes' ? 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? 5) If I want to know how long has vm_1's cpu been used yesterday, how can I do ? It seems like that I have too many questions.. Thank you very much ! --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
Not at all. It means the CPU time consumed is reset to 0, but that's not an issue in itself, the API should be capable to deal with that if you ask for the total usage. Would that total usage be much more apparent if we started metering the delta between CPU times on subsequent polling periods as a gauge measure? (As opposed to treating it as a cumulative measure) /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? Don't understand what you mean, but the CPU counter is a cumulative one, and asking for its sum is a non-sense. You want to ask for (max - min) to get the used value, something which is not in the API yet. I don't think (max - min) would suffice to give an accurate measure of the actual CPU time used, as the counter may have reset multiple times in the course of the requested duration. Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, Eoghan Glynn wrote: Would that total usage be much more apparent if we started metering the delta between CPU times on subsequent polling periods as a gauge measure? (As opposed to treating it as a cumulative measure) I'm rather against the idea of transforming all cumulative counters to delta, for the simple reason that this imply to lose information if your system is not launched to compute delta, or that you have to maintaint a previous value accross restart. The API will be capable to do the operation you need, no matter what the type of counter is (delta or cumulative). I don't think (max - min) would suffice to give an accurate measure of the actual CPU time used, as the counter may have reset multiple times in the course of the requested duration. It is, because /max in the API should be aware of the fact a reset can occur and computes accordingly. We started to discuss this a bit in: https://bugs.launchpad.net/ceilometer/+bug/1061817 -- Julien Danjou # Free Software hacker freelance # http://julien.danjou.info pgpRVR7qdZlI9.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
I don't think (max - min) would suffice to give an accurate measure of the actual CPU time used, as the counter may have reset multiple times in the course of the requested duration. It is, because /max in the API should be aware of the fact a reset can occur and computes accordingly. We started to discuss this a bit in: https://bugs.launchpad.net/ceilometer/+bug/1061817 A-ha, OK, so not so much (max - min) as: (\Sigma local maxima) - first Sounds computationally expensive to produce on the fly, but maybe the local maxima can be efficiently recorded as the data is being ingested. Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, Eoghan Glynn wrote: A-ha, OK, so not so much (max - min) as: (\Sigma local maxima) - first Yeah, excuse my math. :) Sounds computationally expensive to produce on the fly, but maybe the local maxima can be efficiently recorded as the data is being ingested. Yes it's more expense in theory, but in practice I'm rather than with a good back-end it's not a problem (either pre-compute or have the right toolslike PostgreSQL). -- Julien Danjou ;; Free Software hacker freelance ;; http://julien.danjou.info pgp32opSBDDAj.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31, 2012 at 10:23 AM, Eoghan Glynn egl...@redhat.com wrote: Hi Yawei Wu, The root of the confusion is the fact the cpu meter is reporting the cumlative cpu_time stat from libvirt. This libvirt counter is reset when the associated qemu process is restarted (an artifact of how cpuacct works). So when you stop/start or suspend/resume, a fresh qemu process is sparked up, then the cumulative time is reset. Thanks for bringing this up, as it has implications as to how we meter CPU time and utilization[1]. We may need to start metering the delta between CPU times on subsequent polling cycles, instead of using a cumulative meter (dealing with the edge case where the instance has been restarted within a polling period). Good idea. We need to capture this issue to make sure we get it onto the roadmap for this cycle. Is there a bug or blueprint for it yet? Doug Cheers, Eoghan [1] https://review.openstack.org/14921 I am still testing ceilometer now. I am confused about the meter volume in the mongodb. Let's talk about cpu usage. After I create and boot a vm named vm_1, meter data record about cpu usage will be inserted into db in cycle(default 10 minutes). For example,the 'counter_volume' of the first record is '5206000',and the second one is '12389000'. 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it seem like to be 1238.9 seconds actually, is there something wrong ? 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu usage record increase all the time ? Just like '8 minutes' - '18 minutes' - '28 minutes' ? 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu usage record will count from zero. Just like '8 minutes' - '18 minutes' - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it mean that 'counter_volume' just represents how long has vm_1 been booted up ? 4) This one is about Web API. I find that GET /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'. Is it reduplicate ? 5) If I want to know how long has vm_1's cpu been used yesterday, how can I do ? It seems like that I have too many questions.. Thank you very much ! --- Yawei Wu Dalian Hi-Think Computer Technology,Corp. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31, 2012 at 11:25 AM, Eoghan Glynn egl...@redhat.com wrote: I don't think (max - min) would suffice to give an accurate measure of the actual CPU time used, as the counter may have reset multiple times in the course of the requested duration. It is, because /max in the API should be aware of the fact a reset can occur and computes accordingly. We started to discuss this a bit in: https://bugs.launchpad.net/ceilometer/+bug/1061817 A-ha, OK, so not so much (max - min) as: (\Sigma local maxima) - first Sounds computationally expensive to produce on the fly, but maybe the local maxima can be efficiently recorded as the data is being ingested. Is that better than just reporting the data in a more easily digested format in the first place? Julien, I don't understand your comment about losing data if your system is not launched to compute delta. Can you clarify what you mean there? I do understand that the agent would need to store state about the counter locally in order to track the delta value, but I think we could provide a convenient way for pollsters to do that without complicating them excessively. Doug Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, Doug Hellmann wrote: Is that better than just reporting the data in a more easily digested format in the first place? IMHO yes. Julien, I don't understand your comment about losing data if your system is not launched to compute delta. Can you clarify what you mean there? I do understand that the agent would need to store state about the counter locally in order to track the delta value, but I think we could provide a convenient way for pollsters to do that without complicating them excessively. Yes, actually I think you got what I meant, I just wrote it badly. By system, I meant pollster. If your pollster is not running to compute delta and you have no state stored, you'll miss a part of what has been used. Now, I don't think trying to circumvente that at the pollster level is a good idea either, because it complicates the pollster for no good reason. Ultimately, if you reaaay want to compute delta instead of using the real values of the counter for whatever (bad) reaason, doing it at the storage back-end lavel might be an option if you want. But as I said, for now there's now reason it should be needed. (And you know what they say, early optimization is the root of all evil :) -- Julien Danjou // Free Software hacker freelance // http://julien.danjou.info pgpy2t7myJXp4.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
If your pollster is not running to compute delta and you have no state stored, you'll miss a part of what has been used. Would we have also have some 'misses' with the cumulative approach when the ceilometer agent was down? If I understood the (\Sigma local maxima)-first idea correctly, the usage up to the first polling cycle would always be discounted from any duration. Similarly, calculating the time delta as a gauge measure would discount only the usage up to the first libvirt poll after each ceilo agent restart. As long as the ceilo compute agent was restarted only rarely, I'm not sure the under-reporting would be a huge issue in either case. A more pernicious problem would occur if the instance was being regularly restarted with a higher frequency than the polling period. Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
On Wed, Oct 31 2012, Eoghan Glynn wrote: Would we have also have some 'misses' with the cumulative approach when the ceilometer agent was down? No, unless the counter resets several times while your agent is down. But delta has the same issue. If I understood the (\Sigma local maxima)-first idea correctly, the usage up to the first polling cycle would always be discounted from any duration. No, because if you have: Time | Value 0| 10 1| 30 2| 50 3| 80 4| 100 If your delta-pollster is down at 1 and 2, you restart at 3, therefore at 4 you'll send 20 as usage (100 minus 80). So you miss the delta between 10 (time 0) and 80 (time 3) (therefore 70 for free!). If you send right away 80 at time 3 when restarting, the API will be able to guess that between 0 and 3 the value went from 10 to 80. With delta approach, the API cannot guess that. A more pernicious problem would occur if the instance was being regularly restarted with a higher frequency than the polling period. Yes, but in that case, whatever counting method you use, you're screwed. :) -- Julien Danjou -- Free Software hacker freelance -- http://julien.danjou.info pgpPm0HT9Gwc2.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [ceilometer] meter data volume
Would we have also have some 'misses' with the cumulative approach when the ceilometer agent was down? No, unless the counter resets several times while your agent is down. But delta has the same issue. If I understood the (\Sigma local maxima)-first idea correctly, the usage up to the first polling cycle would always be discounted from any duration. No, because if you have: Time | Value 0| 10 1| 30 2| 50 3| 80 4| 100 If your delta-pollster is down at 1 and 2, you restart at 3, therefore at 4 you'll send 20 as usage (100 minus 80). So you miss the delta between 10 (time 0) and 80 (time 3) (therefore 70 for free!). If you send right away 80 at time 3 when restarting, the API will be able to guess that between 0 and 3 the value went from 10 to 80. With delta approach, the API cannot guess that. Yep the sum of local maxima is not lossy as long as the requested duration completely encapsulates the compute agent outage (and the instance doesn't restart during the outage). However I was more thinking of the scenario where the duration requested via the API is say t1..t4 in your example above. In any case, do we need a new measurement type, in addition to the existing CUMULATIVE type, that captures the non-monotonic nature of the measure and alerts the API that special handling is required to compute say max-min? Something like TRANSIENT_CUMULATIVE, if that's not too much of a mouthful. A more pernicious problem would occur if the instance was being regularly restarted with a higher frequency than the polling period. Yes, but in that case, whatever counting method you use, you're screwed. :) True that. Cheers, Eoghan ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp