Re: [Openstack] [metering] resource metadata changes and billing
On Thu, Jul 05 2012, Nick Barcet wrote: Makes sense? Yes. And we can probably bend the current API a bit to enhance things. :) -- Julien Danjou ;; Free Software hacker freelance ;; http://julien.danjou.info ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Wed, Jul 4, 2012 at 1:52 PM, Nick Barcet nick.bar...@canonical.comwrote: On 06/29/2012 03:04 PM, Doug Hellmann wrote: [..] My conclusion from all of this (over-)thinking is that the ceilometer API should assume the simple case and ignore the metadata changes when computing the sum or maximum value for a counter over a range of time. More complex processing will be left up to the caller, who can ask for raw metering data in manageable chunks and process them outside of the API. I could be persuaded to do something more complicated if the problems described above can be solved in a relatively simple way, but even then I think we should push that to the v2 API. [..] Sorry for my late reply on this, but... So, if I summarize what you are saying, the problem is that for a given Instance ID, a given meter may have to be interpreted as if the Instance ID was changing over time. Example: t1: Instance A - Has 1 CPU - 64G ram - runs in zone 1 t2: Instance A - Has 2 CPU - 64G ram - runs in zone 1 t3: Instance A - Has 2 CPU - 128G ram - runs in zone 1 t4: Instance A - Has 2 CPU - 128G ram - runs in zone 3 t5: Instance A is stopped From a billing point of view, what is important here is that even though the Instance ID remains the same, we have in fact 4 different segments of time which could lead to 4 different pricing being applied to the same instance: t1-t2: price 1 t2-t3: price 2 t3-t4: price 3 t4-t5: price 4 So we need to be able to inform the rating engine that these events have occurred so that it does not uniformly apply a billing price to from a sum of a given meter volume. But in fact this information is indeed captured and accessible to rating engines via their respective meters. Yes, that is exactly it. Thank you for clarifying what I was trying to say. :-) What is interesting here is that, in my mind, the sum and duration function of the API, when I proposed it, were only meant to be able to: * In a simple amazon type billing model where instances cannot change zone, add CPU or add ram, * In a Private cloud scenario where you only need simple usage stats to inform your users, * in a horizon plugin to give a quick summary of use, and would never be used by any serious rating engines that would in each and every case require to have access to the raw list of events so that it can recreate the full time line of the events. This is where we need to draw the line between metering and rating. I didn't realize that was your intent. I therefore propose that we leave the API as is, knowing the side effects of such high level sum and duration calculations. If we agree on this, I take the action to document the limitation of the summary functions of the API. +1, and thank you for offering to document it. Doug ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On 06/29/2012 03:04 PM, Doug Hellmann wrote: [..] My conclusion from all of this (over-)thinking is that the ceilometer API should assume the simple case and ignore the metadata changes when computing the sum or maximum value for a counter over a range of time. More complex processing will be left up to the caller, who can ask for raw metering data in manageable chunks and process them outside of the API. I could be persuaded to do something more complicated if the problems described above can be solved in a relatively simple way, but even then I think we should push that to the v2 API. [..] Sorry for my late reply on this, but... So, if I summarize what you are saying, the problem is that for a given Instance ID, a given meter may have to be interpreted as if the Instance ID was changing over time. Example: t1: Instance A - Has 1 CPU - 64G ram - runs in zone 1 t2: Instance A - Has 2 CPU - 64G ram - runs in zone 1 t3: Instance A - Has 2 CPU - 128G ram - runs in zone 1 t4: Instance A - Has 2 CPU - 128G ram - runs in zone 3 t5: Instance A is stopped From a billing point of view, what is important here is that even though the Instance ID remains the same, we have in fact 4 different segments of time which could lead to 4 different pricing being applied to the same instance: t1-t2: price 1 t2-t3: price 2 t3-t4: price 3 t4-t5: price 4 So we need to be able to inform the rating engine that these events have occurred so that it does not uniformly apply a billing price to from a sum of a given meter volume. But in fact this information is indeed captured and accessible to rating engines via their respective meters. What is interesting here is that, in my mind, the sum and duration function of the API, when I proposed it, were only meant to be able to: * In a simple amazon type billing model where instances cannot change zone, add CPU or add ram, * In a Private cloud scenario where you only need simple usage stats to inform your users, * in a horizon plugin to give a quick summary of use, and would never be used by any serious rating engines that would in each and every case require to have access to the raw list of events so that it can recreate the full time line of the events. This is where we need to draw the line between metering and rating. I therefore propose that we leave the API as is, knowing the side effects of such high level sum and duration calculations. If we agree on this, I take the action to document the limitation of the summary functions of the API. Nick signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Wed, Jul 04 2012, Nick Barcet wrote: I therefore propose that we leave the API as is, knowing the side effects of such high level sum and duration calculations. If we agree on this, I take the action to document the limitation of the summary functions of the API. So, if I understand you correctly, that would mean the API is non-usable for anybody wanting to do fine-grained billing and they would have to connect to the database engine directly? Because if that's the case I think we miss an abstraction layer somewhere. -- Julien Danjou # Free Software hacker freelance # http://julien.danjou.info ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On 07/04/2012 10:55 PM, Julien Danjou wrote: On Wed, Jul 04 2012, Nick Barcet wrote: I therefore propose that we leave the API as is, knowing the side effects of such high level sum and duration calculations. If we agree on this, I take the action to document the limitation of the summary functions of the API. So, if I understand you correctly, that would mean the API is non-usable for anybody wanting to do fine-grained billing and they would have to connect to the database engine directly? Because if that's the case I think we miss an abstraction layer somewhere. No, this is not what I am saying. The API offers two types of calls: * One which gives access to organized raw events, which rating engines are likely to be using, * One which gives access to summary data, but which are only valid for simple use cases. Hopefully, there are no cases here that should justify connecting to the db engine directly unless one really wants to, and that should not be recommended as we may want to change our storage model over time without breaking existing integrations. The possible variants for describing possible billing strategies are indeed one abstraction level above this. The telco industry calls the tool handling this a rating engine (what transforms metering into line items of a bill) and I intentionally proposed since the beginning that Ceilometer should not be a rating engine. We can hope that our current API will be able to evolve to simplify rating engines requests over time, and we should be ready to accept extension of it to do so. However I do think we do not have the necessary experience with rating to propose a universal solution that will effectively work at this time, and therefore am proposing that we postpone that until we do have the experience. Makes sense? -- Nick Barcet nick.bar...@canonical.com aka: nijaba, nicolas signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Fri, Jun 29 2012, Doug Hellmann wrote: Hi Doug, Sorry for the late reply. I don't think I've made the problem clear. I'm not talking about wanting to calculate the different usage for CPU, RAM, etc. The different counters are calculated separately, so we can keep the amounts for CPU and RAM completely separate, and the API allows the outside user to ask for the amounts for each counter for a resource (or globally for a user/project). The problem is in deciding how the metadata associated with a meter event might cause the provider to change the rate they want to charge for that usage. It's not metadata of a counter that cause the provider to change the rate. It's a meter of a resource that can do that. That only solves part of the problem, though. As a provider I may want to charge different flat rates for the amount of RAM being used. For example, 1 unit for 1024 MB of RAM but 2 units for 4096 MB. That means when the size of the VM changes, we need to produce multiple totals (the length of time that the VM had 1024 MB RAM and then the length of time it had 4096 MB RAM). Yeah, like I said, for the meter 'RAM' of resource 'instance' you can't request a total amount used, because the type of this meter (I don't know how to call it, it's the kind I named if-i-change-you-need-to-split-the-resource-in-several-stuff in my latest email) don't have this semantic. I might also want to change the rate I bill when a VM is migrated between hosts or availability zones (I think we said migration caused a new instance to be created, but bear with me). The availability zone for an instance is clearly metadata and not something we can track via a counter. Again, that's also a meter that has the same type of 'RAM' for a resource 'instance'. That's an interesting idea, and it might solve the problem. At this point in the Folsom schedule though, I would much rather implement a pared down API that handles the simple cases but makes the caller do a bit more data manipulation for complex cases, in favor of focusing on counting more things than we do now. Is that a reasonable approach? Problem is that we might break the API a bit with this. This would not be the first time an OpenStack API is broken, but if we can avoid it, it'd be better. I am not sure you can really bill anything if you're not able to handle a simple thing such a VM resize. So currently, it seems that the API is not designed correctly to handle such a case, and since it's not yet implemented, maybe it's still time to fix it? -- /* Julien Danjou ╭ Free Software hacker freelance ╰ http://julien.danjou.info */ pgpBtQSHifQMF.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Mon, Jul 2, 2012 at 4:58 PM, Julien Danjou jul...@danjou.info wrote: On Fri, Jun 29 2012, Doug Hellmann wrote: Hi Doug, Sorry for the late reply. I don't think I've made the problem clear. I'm not talking about wanting to calculate the different usage for CPU, RAM, etc. The different counters are calculated separately, so we can keep the amounts for CPU and RAM completely separate, and the API allows the outside user to ask for the amounts for each counter for a resource (or globally for a user/project). The problem is in deciding how the metadata associated with a meter event might cause the provider to change the rate they want to charge for that usage. It's not metadata of a counter that cause the provider to change the rate. It's a meter of a resource that can do that. No, there are cases where the metadata will affect the rate. For instance, it costs a different amount to have an instance in each of Amazon's availability zones (data centers). The counter would still say that the instance has been running for a certain amount of time, but the *rate* for charging for that time would depend on where it is. A representative from HP requested the same thing in ceilometer, and we may use it at DreamHost, too, eventually. That only solves part of the problem, though. As a provider I may want to charge different flat rates for the amount of RAM being used. For example, 1 unit for 1024 MB of RAM but 2 units for 4096 MB. That means when the size of the VM changes, we need to produce multiple totals (the length of time that the VM had 1024 MB RAM and then the length of time it had 4096 MB RAM). Yeah, like I said, for the meter 'RAM' of resource 'instance' you can't request a total amount used, because the type of this meter (I don't know how to call it, it's the kind I named if-i-change-you-need-to-split-the-resource-in-several-stuff in my latest email) don't have this semantic. I might also want to change the rate I bill when a VM is migrated between hosts or availability zones (I think we said migration caused a new instance to be created, but bear with me). The availability zone for an instance is clearly metadata and not something we can track via a counter. Again, that's also a meter that has the same type of 'RAM' for a resource 'instance'. That's an interesting idea, and it might solve the problem. At this point in the Folsom schedule though, I would much rather implement a pared down API that handles the simple cases but makes the caller do a bit more data manipulation for complex cases, in favor of focusing on counting more things than we do now. Is that a reasonable approach? Problem is that we might break the API a bit with this. This would not be the first time an OpenStack API is broken, but if we can avoid it, it'd be better. I am not sure you can really bill anything if you're not able to handle a simple thing such a VM resize. So currently, it seems that the API is not designed correctly to handle such a case, and since it's not yet implemented, maybe it's still time to fix it? We probably have time to fix it before the release. On the other hand, it seems much more important to use work on writing data collectors (new pollsters, adding notifications to other projects, etc.). I don't think we can do both things. Doug ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Mon, Jul 02 2012, Doug Hellmann wrote: No, there are cases where the metadata will affect the rate. For instance, it costs a different amount to have an instance in each of Amazon's availability zones (data centers). The counter would still say that the instance has been running for a certain amount of time, but the *rate* for charging for that time would depend on where it is. A representative from HP requested the same thing in ceilometer, and we may use it at DreamHost, too, eventually. I totally agree with you, Doug. I'm just saying that's it's not *only* a metadata. The zone must be some kind of a meter, even if it's not numeric. It should be a meter with a type that causes the resource (here the instance) to be billed differently (and therefore to generate multiple objects when returning resource usage metering). Clearly the term meter is probably not the good one, maybe we should split this, but to me it must be extracted from metadata to become something more. Something we can rely on to take the decision that this is something worst splitting the metered resource in different parts because the billing must change (zone, RAM, flavor, volume size…). Speaking of volume size, if you take the example of a storage volume, you likely to have the same issue. You may not charge the same thing if your volume total size is 1 GB or 10 GB, and if it has been resize you want (not sure it's possible, but one day) to know when precisely. Whereas size used is likely to be just a generic absolute meter. We probably have time to fix it before the release. On the other hand, it seems much more important to use work on writing data collectors (new pollsters, adding notifications to other projects, etc.). I don't think we can do both things. Sure. Anyway, we both know that's a do-ocracy. :-) -- /* Julien Danjou ╭ Free Software hacker freelance ╰ http://julien.danjou.info */ pgpP7E1Rq08GA.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] [metering] resource metadata changes and billing
tl;dr: Ceilometer should ignore resource metadata when computing sums or maximum values for counters through the API. One of the things we discussed early during the design meetings was the need to track metadata along with resources so providers could use the metadata to determine the rate to charge for using the resource (for example, flavor or availability group of an instance). While working on the mongodb driver, I've been thinking about how that requirement changes what the API we defined needs to do. At first I thought we would need to try to return multiple values from the sum and max calls so the values could be associated with the resource metadata and a given time range. I've decided that implementing that inside the ceilometer API will be very complex, and unlikely to produce the correct result. I want to work through my reasoning here in case someone else can find a fault in it or propose a solution I wasn't able to find. First, the scenario: A user boots an instance, lets it run for some time (period 1), then changes the metadata in some way that does not result in the instance being recreated but does result in something the provider would decide introduces a different charge structure. For example, the amount of RAM allocated to the instance might be increased. After running with the new settings for a period of time (period 2), the user changes them back to their original value and the instance continues to run (period 3). The specific change to the metadata doesn't matter (RAM is just an example), except that the metadata change should not require an instance to be recreated because when that happens the user actually gets a new instance (at least I believe they do based on feedback during an early meeting, please correct me if I'm wrong). Getting a new instance simplifies things immensely, since the new instance is a completely new resource and so we can ignore those cases for the rest of this discussion. Another important point in the way the scenario is constructed is that the metadata values go from value A to B and then back to their original value A. That means any signature we calculate for the metadata will be the same during periods 1 and 3. Now we would like for v1/USERS/USER_ID/RESOURCES/instance_id/cpu/VOLUME to return the amount of CPU time used by the instance. However, if that time is billed at a different rate depending on the size of the server then the RAM change will cause a difference in billing rates. We therefore need to return a sequence of 3-tuples containing the total for the counter, the resource metadata, and the time range during which the metadata was in effect. There are two problems implementing this in a generic way in the API server. First, it turns out to be surprisingly difficult to write an efficient query to compute that time range in the case described because it is not easy to recognize the ranges for period 1 and period 3 as being interrupted by period 2 (finding min or max for a value is easy, but finding the endpoint of period 1 is not because the signature for the metadata is the same for period 1 and 3). Second, while calculating ranges is difficult in itself, what is even more difficult is recognizing *important* changes in the metadata that actually imply a change in the billing rate. The logic for that is up to the deployer and their billing rules. There are ways to compute the ranges by using multiple queries [1], and we could create some sort of way for the query to specify which fields in the metadata for a given type of resource are important. Both calculations would be expensive to apply in the API server, though, and I think they can be solved more efficiently on the client-side. If the client grabs all (or a large portion) of the data and processes it sequentially, it is easy to test the metadata fields to find changes and treat that condition as the boundary between the time ranges. My conclusion from all of this (over-)thinking is that the ceilometer API should assume the simple case and ignore the metadata changes when computing the sum or maximum value for a counter over a range of time. More complex processing will be left up to the caller, who can ask for raw metering data in manageable chunks and process them outside of the API. I could be persuaded to do something more complicated if the problems described above can be solved in a relatively simple way, but even then I think we should push that to the v2 API. Thoughts? There-and-back-again-ly, Doug [1] Find the min time for a (resource, metadata) pair; find min time for any different (resource, metadata) pair greater than the first time; find max for original pair with timestamp less than second min; use the max as starting point to determine the next range; loop. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help :
Re: [Openstack] [metering] resource metadata changes and billing
On Fri, Jun 29 2012, Doug Hellmann wrote: Thoughts? Please correct me if I'm wrong. What I understand is that you're saying that something like: GET v1/[SOURCES/SOURCE/]USERS/USER_ID/RESOURCES/RESOURCE_ID/METER/VOLUME as defined in the current API draft, for a counter like instance CPU time or RAM size has no sense because it can depends on other metadata From the instance. That sounds right in many case, especially instances. The best way to fix this is to not return a sum, but a set of documents describing the different changes. E.g. for an (simplified) instance liked you described GET v1/users/jd/instances/1234 { id: 1234, name: foobar, …, memory: 1024, cpu_time: 393010, start: 2012-06-01 00:00:00, end: 2012-06-12 12:00:00, } { id: 1234, name: foobar, …, memory: 2048, cpu_time: 1231294, start: 2012-06-12 12:00:01, end: 2012-06-26 13:24:43 } { id: 1234, name: foobar, …, memory: 1024, cpu_time: 4013510, start: 2012-06-26 13:24:44, end: 2012-06-30 23:59:00, } Doing this does not sounds like too much crazy. You just have to iterate over each record concerning instance 1234. You create a first document From the first record. Then if you encounter a meter that is: - a delta (i.e. cpu_time), you sum up to the data you got from the latest record in your document - an incremented counter, you replace the last one with this one - an absolute counter (i.e. memory) you start a new document, keep the latest values from incremented counters (so you can substract the value and start from 0), and reset all delta counters to 0. If none of the absolute counter (RAM, number of CPU, …) changes, you'll end up with only one document for the whole period. If one change, you'll get multiple document describing the resources consumed for each span time. WDYT? -- Julien pgpJHpowFKbmg.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Fri, Jun 29 2012, Doug Hellmann wrote: We do have counters for RAM and CPU separate from instance. But the rate at which the provider bills for those things may vary based on metadata. My example may be bad because it uses 2 values we're measuring, one of which also shows up in the metadata for another. As a different example, take the instance display name. The display name is under the control of the user and is extremely unlikely to reflect a change in the billing rate. However, changing the display name changes the metadata for the instance. A naive implementation of the processing loop would pick that up and generate multiple documents even though there is no need to do so. Yep, but the display name is not a counter. Memory is a counter. An instance is made of several counter. We need to split metered objects based on their absolute counter changing (memory, number of core…), not based on random metadata, i.e. a resource have several meters. So what was considered as metadata (like memory) so far should changed to become a meter of an resource (like an instance) and have for this one a special type (not sure about the type name to use). We may need to refine our model to be a bit more hierarhical like: resource -- counter #1 of type 'relative' | `- counter #2 of type 'absolute' ` counter #3 of type 'if-i-change-you-need-to-split-the-resource-in-several-stuff' etc… -- Julien pgp0SCtPP0cCp.pgp Description: PGP signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [metering] resource metadata changes and billing
On Fri, Jun 29, 2012 at 1:19 PM, Julien Danjou jul...@danjou.info wrote: On Fri, Jun 29 2012, Doug Hellmann wrote: We do have counters for RAM and CPU separate from instance. But the rate at which the provider bills for those things may vary based on metadata. My example may be bad because it uses 2 values we're measuring, one of which also shows up in the metadata for another. As a different example, take the instance display name. The display name is under the control of the user and is extremely unlikely to reflect a change in the billing rate. However, changing the display name changes the metadata for the instance. A naive implementation of the processing loop would pick that up and generate multiple documents even though there is no need to do so. Yep, but the display name is not a counter. Memory is a counter. An instance is made of several counter. We need to split metered objects based on their absolute counter changing (memory, number of core…), not based on random metadata, i.e. a resource have several meters. I don't think I've made the problem clear. I'm not talking about wanting to calculate the different usage for CPU, RAM, etc. The different counters are calculated separately, so we can keep the amounts for CPU and RAM completely separate, and the API allows the outside user to ask for the amounts for each counter for a resource (or globally for a user/project). The problem is in deciding how the metadata associated with a meter event might cause the provider to change the rate they want to charge for that usage. So what was considered as metadata (like memory) so far should changed to become a meter of an resource (like an instance) and have for this one a special type (not sure about the type name to use). That only solves part of the problem, though. As a provider I may want to charge different flat rates for the amount of RAM being used. For example, 1 unit for 1024 MB of RAM but 2 units for 4096 MB. That means when the size of the VM changes, we need to produce multiple totals (the length of time that the VM had 1024 MB RAM and then the length of time it had 4096 MB RAM). I might also want to change the rate I bill when a VM is migrated between hosts or availability zones (I think we said migration caused a new instance to be created, but bear with me). The availability zone for an instance is clearly metadata and not something we can track via a counter. We may need to refine our model to be a bit more hierarhical like: resource -- counter #1 of type 'relative' | `- counter #2 of type 'absolute' ` counter #3 of type 'if-i-change-you-need-to-split-the-resource-in-several-stuff' That's an interesting idea, and it might solve the problem. At this point in the Folsom schedule though, I would much rather implement a pared down API that handles the simple cases but makes the caller do a bit more data manipulation for complex cases, in favor of focusing on counting more things than we do now. Is that a reasonable approach? etc… -- Julien ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp