Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread 吴亚伟

于 2012年10月31日 21:56, Julien Danjou 写道:

On Wed, Oct 31 2012, 吴亚伟 wrote:


1) '12389000' nanoseconds means '123.89' seconds or two minutes,it
seem like to be 1238.9 seconds actually, is there something wrong ?

Why do you think it's 1238.9 seconds?


Well, to be honest, I do really though it's an uptime...


3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu
usage record will count from zero. Just like '8 minutes' - '18 minutes'
- '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it
mean that 'counter_volume' just represents how long has vm_1 been booted
up ?

Not at all. It means the CPU time consumed is reset to 0, but that's not
an issue in itself, the API should be capable to deal with that if you
ask for the total usage.


Is the API capable to deal with that at present? If not, when?




4) This one is about Web API. I find that GET
/v1/resources/(resource)/meters/(meter)/volume/sum just return the sum
value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'.
Is it reduplicate ?

Don't understand what you mean, but the CPU counter is a cumulative one,
and asking for its sum is a non-sense. You want to ask for (max - min)
to get the used value, something which is not in the API yet.



Is the Web API in the document going to be updated recently?


Thanks

---
Yawei Wu
Dalian Hi-Think Computer Technology,Corp.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread 吴亚伟

Hi Eoghan,

Thanks for your reply. As we can see from the document:
-

Three type of meters are defined in ceilometer:

TypeDefinition
Cumulative  Increasing over time (instance hours)
Gauge 	Discrete items (floating IPs, image uploads) and fluctuating 
values (disk I/O)

Delta   Changing over time (bandwidth)



Cumulative type is apparent, while even with descriptions gauge and 
delta type confuse me.

Could you explain them through examples or by sharing an use case?


Thanks

---
Yawei Wu
Dalian Hi-Think Computer Technology,Corp.




Hi Yawei Wu,

The root of the confusion is the fact the cpu meter is reporting
the cumlative cpu_time stat from libvirt. This libvirt counter is
reset when the associated qemu process is restarted (an artifact
of how cpuacct works).

So when you stop/start or suspend/resume, a fresh qemu process
is sparked up, then the cumulative time is reset.

Thanks for bringing this up, as it has implications as to how
we meter CPU time and utilization[1].

We may need to start metering the delta between CPU times on
subsequent polling cycles, instead of using a cumulative meter
(dealing with the edge case where the instance has been restarted
within a polling period).

Cheers,
Eoghan

[1] https://review.openstack.org/14921



I am still testing ceilometer now. I am confused about the meter
volume
in the mongodb. Let's talk about cpu usage.

After I create and boot a vm named vm_1, meter data record about cpu
usage will be inserted into db in cycle(default 10 minutes). For
example,the 'counter_volume' of the first record is '5206000',and
the second one is '12389000'.

1) '12389000' nanoseconds means '123.89' seconds or two
minutes,it
seem like to be 1238.9 seconds actually, is there something wrong ?

2) If I never reboot or suspend vm_1, will the 'counter_volume' of
cpu
usage record increase all the time ? Just like '8 minutes' - '18
minutes' - '28 minutes' ?

3) If I reboot or suspend vm_1, I find that the 'counter_volume' of
cpu
usage record will count from zero. Just like '8 minutes' - '18
minutes'
- '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does
it
mean that 'counter_volume' just represents how long has vm_1 been
booted
up ?

4) This one is about Web API. I find that GET
/v1/resources/(resource)/meters/(meter)/volume/sum just return the
sum
value of all the cpu 'counter_volume', like '8 minutes' + '18
minutes'.
Is it reduplicate ?

5) If I want to know how long has vm_1's cpu been used yesterday, how
can I do ?

It seems like that I have too many questions..

Thank you very much !


---
Yawei Wu
Dalian Hi-Think Computer Technology,Corp.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Julien Danjou
On Wed, Oct 31 2012, Eoghan Glynn wrote:

 Yep the sum of local maxima is not lossy as long as the requested
 duration completely encapsulates the compute agent outage (and the
 instance doesn't restart during the outage).

Actually, if there's one restart, it still _can_ be safe in certain
cirumstances such as in a case like:

Time | Value
0| 1000
1| 3000 (agent down)
2| 0(agent down)
3| 80
4| 100

If in this particular case, with the case where your agent was down at
t1 and t2. The API will detect the counter reseted while the agent was
down. With cumulative model, the loss is again less than the computed
delta model.

OTOH, both models fails to get some data with a case like:

Time | Value
0| 1000
1| 3000 (agent down)
2| 0(agent down)
3| 8000
4| 1

 However I was more thinking of the scenario where the duration
 requested  via the API is say t1..t4 in your example above.

 In any case, do we need a new measurement type, in addition to the
 existing CUMULATIVE type, that captures the non-monotonic nature of
 the measure and alerts the API that special handling is required to
 compute say max-min?

 Something like TRANSIENT_CUMULATIVE, if that's not too much of a
 mouthful.

We discussed it already with Doug, and came to conclusion that we
didn't, because the monotonic case is just a special case of the non
monotonic one. So applying the computing method to the non-monotonic
case will solve all problem.

-- 
Julien Danjou
-- Free Software hacker  freelance
-- http://julien.danjou.info


pgpFZzJXgNfiW.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Julien Danjou
On Thu, Nov 01 2012, 吴亚伟 wrote:

 Is the API capable to deal with that at present? If not, when?

Not yet. When someone will write with the code!

 Is the Web API in the document going to be updated recently?

We hope so!

-- 
Julien Danjou
-- Free Software hacker  freelance
-- http://julien.danjou.info


pgpRKrwXhvbAn.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Julien Danjou
On Thu, Nov 01 2012, 吴亚伟 wrote:

 Cumulative type is apparent, while even with descriptions gauge and delta
 type confuse me.
 Could you explain them through examples or by sharing an use case?

Gauge is an absolute value, like a temperature or the number of people
in a room.

Delta is a counter where each value is the difference between the
current and the previous value. Each value represents how many things
were consumed since last time a value has been sent.
It's always compared to a counter that resets to 0 once you read it.

-- 
Julien Danjou
// Free Software hacker  freelance
// http://julien.danjou.info


pgpsqx8oaQ2Qf.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Doug Hellmann
On Wed, Oct 31, 2012 at 1:39 PM, Julien Danjou jul...@danjou.info wrote:

 On Wed, Oct 31 2012, Eoghan Glynn wrote:

  Would we have also have some 'misses' with the cumulative approach
  when the ceilometer agent was down?

 No, unless the counter resets several times while your agent is down.
 But delta has the same issue.

  If I understood the (\Sigma local maxima)-first idea correctly,
  the usage up to the first polling cycle would always be
  discounted from any duration.

 No, because if you have:

 Time | Value
 0| 10
 1| 30
 2| 50
 3| 80
 4| 100

 If your delta-pollster is down at 1 and 2, you restart at 3, therefore
 at 4 you'll send 20 as usage (100 minus 80). So you miss the delta

between 10 (time 0) and 80 (time 3) (therefore 70 for free!).
 If you send right away 80 at time 3 when restarting, the API will be
 able to guess that between 0 and 3 the value went from 10 to 80.
 With delta approach, the API cannot guess that.


Sure it can, you just need to move where the caching is done. Using a local
cache to maintain the previous time a value was published you would know at
time 3 that the last published value was 10, and so send 70. So the total
will be correct.

Doug
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Eoghan Glynn


  if you have:
  
  Time | Value
  0 | 10
  1 | 30
  2 | 50
  3 | 80
  4 | 100
  
  If your delta-pollster is down at 1 and 2, you restart at 3,
  therefore at 4 you'll send 20 as usage (100 minus 80).
  So you miss the delta between 10 (time 0) and 80 (time 3)
  (therefore 70 for free!).  If you send right away 80 at
  time 3 when restarting, the API will be able to guess that
  between 0 and 3 the value went from 10 to 80.  With delta
  approach, the API cannot guess that.

 
 Sure it can, you just need to move where the caching is done. Using a
 local cache to maintain the previous time a value was published you
 would know at time 3 that the last published value was 10, and so
 send 70. So the total will be correct.

Good point, previously IIUC there was an implicit assumption that
any prev time caching would be done in-memory, hence lost across
process restarts.

But as you point out, these data could be persisted locally by the
compute agent.

What would be the best way to achieve this? A small sqlite DB 
per-agent, or even simpler just a pickled dict? The latter would
avoid the complexity of DB versioning and migration.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Doug Hellmann
On Thu, Nov 1, 2012 at 1:31 PM, Eoghan Glynn egl...@redhat.com wrote:



   if you have:
  
   Time | Value
   0 | 10
   1 | 30
   2 | 50
   3 | 80
   4 | 100
  
   If your delta-pollster is down at 1 and 2, you restart at 3,
   therefore at 4 you'll send 20 as usage (100 minus 80).
   So you miss the delta between 10 (time 0) and 80 (time 3)
   (therefore 70 for free!).  If you send right away 80 at
   time 3 when restarting, the API will be able to guess that
   between 0 and 3 the value went from 10 to 80.  With delta
   approach, the API cannot guess that.
 
 
  Sure it can, you just need to move where the caching is done. Using a
  local cache to maintain the previous time a value was published you
  would know at time 3 that the last published value was 10, and so
  send 70. So the total will be correct.

 Good point, previously IIUC there was an implicit assumption that
 any prev time caching would be done in-memory, hence lost across
 process restarts.

 But as you point out, these data could be persisted locally by the
 compute agent.

 What would be the best way to achieve this? A small sqlite DB
 per-agent, or even simpler just a pickled dict? The latter would
 avoid the complexity of DB versioning and migration.


I discussed this issue at the summit with James Penick of Yahoo, and he
showed me some code in their agent that is using a sqllite db. We will want
to build a nice API so pollsters can use the cache without having to worry
about how it is implemented, which would let us deal with any versioning
issues in a central spot.

Doug
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Julien Danjou
On Thu, Nov 01 2012, Eoghan Glynn wrote:

 What would be the best way to achieve this? A small sqlite DB 
 per-agent, or even simpler just a pickled dict? The latter would
 avoid the complexity of DB versioning and migration.

At the risk of repeating myself, can I stress again how much we don't
need to transform cumulative into delta, and certainly not in the
pollster/agents/notifications code?

-- 
Julien Danjou
/* Free Software hacker  freelance
   http://julien.danjou.info */


pgpUMOnj02Wdh.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-11-01 Thread Julien Danjou
On Thu, Nov 01 2012, Eoghan Glynn wrote:

 Well local persistence of the prev times would still be of use I think
 for the in-pollster CPU util % calculation (as discussed here[1]).

Allright, so that may be needed, but not in the pollster. It'll be
needed in the CW publisher that will compute that, because CW don't
offer any other mean (I imagine) to handle cumulative value like
Ceilometer does. :)

(But I know that currently we don't have multi-publisher, and that's why
it's done that way in #14921 :)

-- 
Julien Danjou
// Free Software hacker  freelance
// http://julien.danjou.info


pgpGXlGjlMQCY.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] [ceilometer] meter data volume

2012-10-31 Thread 吴亚伟
Hi Julien,

Sorry to bother you.

I am still testing ceilometer now. I am confused about the meter volume
in the mongodb. Let's talk about cpu usage.

After I create and boot a vm named vm_1, meter data record about cpu
usage will be inserted into db in cycle(default 10 minutes). For
example,the 'counter_volume' of the first record is '5206000',and
the second one is '12389000'.

1) '12389000' nanoseconds means '123.89' seconds or two minutes,it
seem like to be 1238.9 seconds actually, is there something wrong ?

2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu
usage record increase all the time ? Just like '8 minutes' - '18
minutes' - '28 minutes' ?

3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu
usage record will count from zero. Just like '8 minutes' - '18 minutes'
- '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it
mean that 'counter_volume' just represents how long has vm_1 been booted
up ?

4) This one is about Web API. I find that GET
/v1/resources/(resource)/meters/(meter)/volume/sum just return the sum
value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'.
Is it reduplicate ?

5) If I want to know how long has vm_1's cpu been used yesterday, how
can I do ?

It seems like that I have too many questions..

Thank you very much !


---
Yawei Wu
Dalian Hi-Think Computer Technology,Corp.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Julien Danjou
On Wed, Oct 31 2012, 吴亚伟 wrote:

 1) '12389000' nanoseconds means '123.89' seconds or two minutes,it
 seem like to be 1238.9 seconds actually, is there something wrong ?

Why do you think it's 1238.9 seconds?

 2) If I never reboot or suspend vm_1, will the 'counter_volume' of cpu
 usage record increase all the time ? Just like '8 minutes' - '18
 minutes' - '28 minutes' ?

It's a CPU time, not an uptime.

http://en.wikipedia.org/wiki/CPU_time

 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of cpu
 usage record will count from zero. Just like '8 minutes' - '18 minutes'
 - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does it
 mean that 'counter_volume' just represents how long has vm_1 been booted
 up ?

Not at all. It means the CPU time consumed is reset to 0, but that's not
an issue in itself, the API should be capable to deal with that if you
ask for the total usage.

 4) This one is about Web API. I find that GET
 /v1/resources/(resource)/meters/(meter)/volume/sum just return the sum
 value of all the cpu 'counter_volume', like '8 minutes' + '18 minutes'.
 Is it reduplicate ?

Don't understand what you mean, but the CPU counter is a cumulative one,
and asking for its sum is a non-sense. You want to ask for (max - min)
to get the used value, something which is not in the API yet.

 5) If I want to know how long has vm_1's cpu been used yesterday, how
 can I do ?

Just like I wrote above. :)

-- 
Julien Danjou
;; Free Software hacker  freelance
;; http://julien.danjou.info


pgpLlWTwzAZGK.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Eoghan Glynn


Hi Yawei Wu,

The root of the confusion is the fact the cpu meter is reporting
the cumlative cpu_time stat from libvirt. This libvirt counter is
reset when the associated qemu process is restarted (an artifact
of how cpuacct works).

So when you stop/start or suspend/resume, a fresh qemu process
is sparked up, then the cumulative time is reset.

Thanks for bringing this up, as it has implications as to how
we meter CPU time and utilization[1].

We may need to start metering the delta between CPU times on 
subsequent polling cycles, instead of using a cumulative meter
(dealing with the edge case where the instance has been restarted
within a polling period).

Cheers,
Eoghan

[1] https://review.openstack.org/14921


 I am still testing ceilometer now. I am confused about the meter
 volume
 in the mongodb. Let's talk about cpu usage.
 
 After I create and boot a vm named vm_1, meter data record about cpu
 usage will be inserted into db in cycle(default 10 minutes). For
 example,the 'counter_volume' of the first record is '5206000',and
 the second one is '12389000'.
 
 1) '12389000' nanoseconds means '123.89' seconds or two
 minutes,it
 seem like to be 1238.9 seconds actually, is there something wrong ?
 
 2) If I never reboot or suspend vm_1, will the 'counter_volume' of
 cpu
 usage record increase all the time ? Just like '8 minutes' - '18
 minutes' - '28 minutes' ?
 
 3) If I reboot or suspend vm_1, I find that the 'counter_volume' of
 cpu
 usage record will count from zero. Just like '8 minutes' - '18
 minutes'
 - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does
 it
 mean that 'counter_volume' just represents how long has vm_1 been
 booted
 up ?
 
 4) This one is about Web API. I find that GET
 /v1/resources/(resource)/meters/(meter)/volume/sum just return the
 sum
 value of all the cpu 'counter_volume', like '8 minutes' + '18
 minutes'.
 Is it reduplicate ?
 
 5) If I want to know how long has vm_1's cpu been used yesterday, how
 can I do ?
 
 It seems like that I have too many questions..
 
 Thank you very much !
 
 
 ---
 Yawei Wu
 Dalian Hi-Think Computer Technology,Corp.
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Eoghan Glynn

 Not at all. It means the CPU time consumed is reset to 0, but
 that's not an issue in itself, the API should be capable to
 deal with that if you ask for the total usage.

Would that total usage be much more apparent if we started
metering the delta between CPU times on subsequent polling
periods as a gauge measure? (As opposed to treating it as
a cumulative measure)


  /v1/resources/(resource)/meters/(meter)/volume/sum just
  return the sum value of all the cpu 'counter_volume', like '8
  minutes' + '18 minutes'.  Is it reduplicate ?

 Don't understand what you mean, but the CPU counter is a
 cumulative one, and asking for its sum is a non-sense. You want
 to ask for (max - min) to get the used value, something which
 is not in the API yet.

I don't think (max - min) would suffice to give an accurate
measure of the actual CPU time used, as the counter may have
reset multiple times in the course of the requested duration.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Julien Danjou
On Wed, Oct 31 2012, Eoghan Glynn wrote:

 Would that total usage be much more apparent if we started
 metering the delta between CPU times on subsequent polling
 periods as a gauge measure? (As opposed to treating it as
 a cumulative measure)

I'm rather against the idea of transforming all cumulative counters to
delta, for the simple reason that this imply to lose information if your
system is not launched to compute delta, or that you have to maintaint a
previous value accross restart.
The API will be capable to do the operation you need, no matter what the
type of counter is (delta or cumulative).

 I don't think (max - min) would suffice to give an accurate
 measure of the actual CPU time used, as the counter may have
 reset multiple times in the course of the requested duration.

It is, because /max in the API should be aware of the fact a reset can
occur and computes accordingly. We started to discuss this a bit in:

  https://bugs.launchpad.net/ceilometer/+bug/1061817

-- 
Julien Danjou
# Free Software hacker  freelance
# http://julien.danjou.info


pgpRVR7qdZlI9.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Eoghan Glynn


  I don't think (max - min) would suffice to give an accurate
  measure of the actual CPU time used, as the counter may have
  reset multiple times in the course of the requested duration.
 
 It is, because /max in the API should be aware of the fact a 
 reset can occur and computes accordingly. We started to discuss
 this a bit in:
 
   https://bugs.launchpad.net/ceilometer/+bug/1061817

A-ha, OK, so not so much (max - min) as:

   (\Sigma local maxima) - first

Sounds computationally expensive to produce on the fly, but maybe
the local maxima can be efficiently recorded as the data is being
ingested.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Julien Danjou
On Wed, Oct 31 2012, Eoghan Glynn wrote:

 A-ha, OK, so not so much (max - min) as:

(\Sigma local maxima) - first

Yeah, excuse my math. :)

 Sounds computationally expensive to produce on the fly, but maybe
 the local maxima can be efficiently recorded as the data is being
 ingested.

Yes it's more expense in theory, but in practice I'm rather than with a
good back-end it's not a problem (either pre-compute or have the right
toolslike PostgreSQL).

-- 
Julien Danjou
;; Free Software hacker  freelance
;; http://julien.danjou.info


pgp32opSBDDAj.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Doug Hellmann
On Wed, Oct 31, 2012 at 10:23 AM, Eoghan Glynn egl...@redhat.com wrote:



 Hi Yawei Wu,

 The root of the confusion is the fact the cpu meter is reporting
 the cumlative cpu_time stat from libvirt. This libvirt counter is
 reset when the associated qemu process is restarted (an artifact
 of how cpuacct works).

 So when you stop/start or suspend/resume, a fresh qemu process
 is sparked up, then the cumulative time is reset.

 Thanks for bringing this up, as it has implications as to how
 we meter CPU time and utilization[1].

 We may need to start metering the delta between CPU times on
 subsequent polling cycles, instead of using a cumulative meter
 (dealing with the edge case where the instance has been restarted
 within a polling period).


Good idea. We need to capture this issue to make sure we get it onto the
roadmap for this cycle. Is there a bug or blueprint for it yet?

Doug



 Cheers,
 Eoghan

 [1] https://review.openstack.org/14921


  I am still testing ceilometer now. I am confused about the meter
  volume
  in the mongodb. Let's talk about cpu usage.
 
  After I create and boot a vm named vm_1, meter data record about cpu
  usage will be inserted into db in cycle(default 10 minutes). For
  example,the 'counter_volume' of the first record is '5206000',and
  the second one is '12389000'.
 
  1) '12389000' nanoseconds means '123.89' seconds or two
  minutes,it
  seem like to be 1238.9 seconds actually, is there something wrong ?
 
  2) If I never reboot or suspend vm_1, will the 'counter_volume' of
  cpu
  usage record increase all the time ? Just like '8 minutes' - '18
  minutes' - '28 minutes' ?
 
  3) If I reboot or suspend vm_1, I find that the 'counter_volume' of
  cpu
  usage record will count from zero. Just like '8 minutes' - '18
  minutes'
  - '28 minutes' [- '0 minutes'] -'5 minutes' - '15 minutes'. Does
  it
  mean that 'counter_volume' just represents how long has vm_1 been
  booted
  up ?
 
  4) This one is about Web API. I find that GET
  /v1/resources/(resource)/meters/(meter)/volume/sum just return the
  sum
  value of all the cpu 'counter_volume', like '8 minutes' + '18
  minutes'.
  Is it reduplicate ?
 
  5) If I want to know how long has vm_1's cpu been used yesterday, how
  can I do ?
 
  It seems like that I have too many questions..
 
  Thank you very much !
 
 
  ---
  Yawei Wu
  Dalian Hi-Think Computer Technology,Corp.
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Doug Hellmann
On Wed, Oct 31, 2012 at 11:25 AM, Eoghan Glynn egl...@redhat.com wrote:



   I don't think (max - min) would suffice to give an accurate
   measure of the actual CPU time used, as the counter may have
   reset multiple times in the course of the requested duration.
 
  It is, because /max in the API should be aware of the fact a
  reset can occur and computes accordingly. We started to discuss
  this a bit in:
 
https://bugs.launchpad.net/ceilometer/+bug/1061817

 A-ha, OK, so not so much (max - min) as:

(\Sigma local maxima) - first

 Sounds computationally expensive to produce on the fly, but maybe
 the local maxima can be efficiently recorded as the data is being
 ingested.


Is that better than just reporting the data in a more easily digested
format in the first place?

Julien, I don't understand your comment about losing data if your system
is not launched to compute delta. Can you clarify what you mean there? I
do understand that the agent would need to store state about the counter
locally in order to track the delta value, but I think we could provide a
convenient way for pollsters to do that without complicating them
excessively.

Doug




 Cheers,
 Eoghan

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Julien Danjou
On Wed, Oct 31 2012, Doug Hellmann wrote:

 Is that better than just reporting the data in a more easily digested
 format in the first place?

IMHO yes.

 Julien, I don't understand your comment about losing data if your system
 is not launched to compute delta. Can you clarify what you mean there? I
 do understand that the agent would need to store state about the counter
 locally in order to track the delta value, but I think we could provide a
 convenient way for pollsters to do that without complicating them
 excessively.

Yes, actually I think you got what I meant, I just wrote it badly. By
system, I meant pollster. If your pollster is not running to compute
delta and you have no state stored, you'll miss a part of what has been
used.

Now, I don't think trying to circumvente that at the pollster level is a
good idea either, because it complicates the pollster for no good reason.

Ultimately, if you reaaay want to compute delta instead of using the
real values of the counter for whatever (bad) reaason, doing it at the
storage back-end lavel might be an option if you want. But as I said,
for now there's now reason it should be needed.

(And you know what they say, early optimization is the root of all evil :)

-- 
Julien Danjou
// Free Software hacker  freelance
// http://julien.danjou.info


pgpy2t7myJXp4.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Eoghan Glynn


 If your pollster is not running to compute delta and you have
 no state stored, you'll miss a part of what has been used.

Would we have also have some 'misses' with the cumulative approach
when the ceilometer agent was down?

If I understood the (\Sigma local maxima)-first idea correctly,
the usage up to the first polling cycle would always be
discounted from any duration.

Similarly, calculating the time delta as a gauge measure would
discount only the usage up to the first libvirt poll after each
ceilo agent restart.

As long as the ceilo compute agent was restarted only rarely, I'm
not sure the under-reporting would be a huge issue in either case.

A more pernicious problem would occur if the instance was being
regularly restarted with a higher frequency than the polling period.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Julien Danjou
On Wed, Oct 31 2012, Eoghan Glynn wrote:

 Would we have also have some 'misses' with the cumulative approach
 when the ceilometer agent was down?

No, unless the counter resets several times while your agent is down.
But delta has the same issue.

 If I understood the (\Sigma local maxima)-first idea correctly,
 the usage up to the first polling cycle would always be
 discounted from any duration.

No, because if you have:

Time | Value
0| 10
1| 30
2| 50
3| 80
4| 100

If your delta-pollster is down at 1 and 2, you restart at 3, therefore
at 4 you'll send 20 as usage (100 minus 80). So you miss the delta
between 10 (time 0) and 80 (time 3) (therefore 70 for free!).
If you send right away 80 at time 3 when restarting, the API will be
able to guess that between 0 and 3 the value went from 10 to 80.
With delta approach, the API cannot guess that.

 A more pernicious problem would occur if the instance was being
 regularly restarted with a higher frequency than the polling period.

Yes, but in that case, whatever counting method you use, you're screwed.
:)

-- 
Julien Danjou
-- Free Software hacker  freelance
-- http://julien.danjou.info


pgpPm0HT9Gwc2.pgp
Description: PGP signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [ceilometer] meter data volume

2012-10-31 Thread Eoghan Glynn


  Would we have also have some 'misses' with the cumulative approach
  when the ceilometer agent was down?
 
 No, unless the counter resets several times while your agent is down.
 But delta has the same issue.
 
  If I understood the (\Sigma local maxima)-first idea correctly,
  the usage up to the first polling cycle would always be
  discounted from any duration.
 
 No, because if you have:
 
 Time | Value
 0| 10
 1| 30
 2| 50
 3| 80
 4| 100
 
 If your delta-pollster is down at 1 and 2, you restart at 3,
 therefore
 at 4 you'll send 20 as usage (100 minus 80). So you miss the delta
 between 10 (time 0) and 80 (time 3) (therefore 70 for free!).
 If you send right away 80 at time 3 when restarting, the API will be
 able to guess that between 0 and 3 the value went from 10 to 80.
 With delta approach, the API cannot guess that.

Yep the sum of local maxima is not lossy as long as the requested
duration completely encapsulates the compute agent outage (and the
instance doesn't restart during the outage).

However I was more thinking of the scenario where the duration
requested  via the API is say t1..t4 in your example above.

In any case, do we need a new measurement type, in addition to the
existing CUMULATIVE type, that captures the non-monotonic nature of
the measure and alerts the API that special handling is required to
compute say max-min?

Something like TRANSIENT_CUMULATIVE, if that's not too much of a
mouthful.

  A more pernicious problem would occur if the instance was being
  regularly restarted with a higher frequency than the polling
  period.
 
 Yes, but in that case, whatever counting method you use, you're
 screwed.
 :)

True that.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp