Re: Recording metadata related to activation
Thanks for all the feedback >From @ Dominic Kim > One option can be storing them as parts of an activation for operators but > exclude them when returning them in response to the user request. Ack. Some of metadata are more for diagnostic purpose and may( should?) not be exposed to end users. So any impl need to distinguish between public and private metadata >From @Erez Hadad > Bottom line: I think this "meta" information needs to be more streamlined > end-to-end, available to code during invocation and persisted post-factum > in the activation record. Adding support for other such "meta" information would be on a case by case basis. So far TransactionId is only missing meta info which we know beforehand and hence now pass that to action. Other meta info so far discussed are generated after the actual invocation. So later we find any meta info which system knows beforehand then we can add support to pass that >From @Tyson Norris > I think a first step is to create separate meta dictionary on Activation > (option 1) without changing the API (use annotations) or runtimes. We can > iterate on invoker/runtime coordination to make passing this data more > consistent, and change /init /run orchestration separately as needed. Ack. So any proposed change should only change the internal storage format. To end user any such meta info (those which are generic like TransactionId) should only be exposed via annotations >From @Matt Rutkowski > The approach that I have seen work elsewhere I refer to as "tagging", that is > "tagging" data (in this case activations) with domain-specific identifiers > used to construct diff. views for diff. domains. I liked this idea. However my only concern here is converting this to an array would prevent us from being selective in what meta info we need to index. Have meta info as dictionary would provide finer control on which meta info operator want to index. For e.g. I may only want to index TransactionId but not the k8s PodId. Later (podId) being only used for some diagnostic work Given activation db is very large I would like to minimize any overhead in terms of indexing of meta info. One can still index all the dict keys if needed (Both Cosmos and Couch can index all keys under a dict if needed). Updated Proposal == 1. Enable the `ContainerResponse` to include a "meta" map. Any key which starts with `_` like `_podId` would be considered private meta key 2. Record all this meta info in the activation under "meta" key. This can also be augmented with system considered meta key like transactionId 3. When sending the Activation record to client - Remove the "meta" dict - Include all "public" meta key like `transactionId` as annotation entry Chetan Mehrotra On Wed, Aug 21, 2019 at 9:43 AM Matt Rutkowski wrote: > > If we intend to add another top-level key to the data to make it more > accessible for index/search, we should do so in a manner that is extensible > for any number of IDs. Index/search, as well as security and business > audits, require identifiers exclusively and this, in my view, is different > from general metadata which should be more descriptive and disposable. > > The approach that I have seen work elsewhere I refer to as "tagging", that is > "tagging" data (in this case activations) with domain-specific identifiers > used to construct diff. views for diff. domains. > > A single key is assoc. with a list of any number of these domain specific > identifiers each expressed as a URI where the URI components include a > prefix/domain that identifies the domain wherein the ID is unique (and > consequently how to interpret the ID), optional paths can be used to further > describe the ID's unique space (resource or purpose) and end with the actual > ID. URIs, aside from being self-descriptive for interpretation, are > desirable as they intrinsically avoid collisions and also do not require a > key as the URI prefix/domain/path uniquely identify the domain/purpose of the > identifier within the same string. > > we could define any number if IDs that are recognized by the OW domain and > event create a resrved prefix to keep them short, e.g., : > > full: "//openwhisk.apache.org/transaction/" > prefixed: "ow:transaction-" > > For example, let's say an activation handled credit card data, one could > "tag" the record with a PCi indicator: > > "//GRC20.gov/cloud/security/pci-dss/transaction/" > > these could appear on an optional key such as: > > { >"tags":[ > "p1://d1/id1", > "p2://d2/id2", > ... >] > } > > tags do not necessarily need to be for IDs alone... that is they can also > help in aggregating search data; for example, we could "tag" all data that > was assigned to a certain region or cluster using this method as well: > > { >"tags":[ > "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1" > "ow:cluster-kube-055b10f", > "ow:trans-0555ffca456919", > ... >]
Re: Recording metadata related to activation
If we intend to add another top-level key to the data to make it more accessible for index/search, we should do so in a manner that is extensible for any number of IDs. Index/search, as well as security and business audits, require identifiers exclusively and this, in my view, is different from general metadata which should be more descriptive and disposable. The approach that I have seen work elsewhere I refer to as "tagging", that is "tagging" data (in this case activations) with domain-specific identifiers used to construct diff. views for diff. domains. A single key is assoc. with a list of any number of these domain specific identifiers each expressed as a URI where the URI components include a prefix/domain that identifies the domain wherein the ID is unique (and consequently how to interpret the ID), optional paths can be used to further describe the ID's unique space (resource or purpose) and end with the actual ID. URIs, aside from being self-descriptive for interpretation, are desirable as they intrinsically avoid collisions and also do not require a key as the URI prefix/domain/path uniquely identify the domain/purpose of the identifier within the same string. we could define any number if IDs that are recognized by the OW domain and event create a resrved prefix to keep them short, e.g., : full: "//openwhisk.apache.org/transaction/" prefixed: "ow:transaction-" For example, let's say an activation handled credit card data, one could "tag" the record with a PCi indicator: "//GRC20.gov/cloud/security/pci-dss/transaction/" these could appear on an optional key such as: { "tags":[ "p1://d1/id1", "p2://d2/id2", ... ] } tags do not necessarily need to be for IDs alone... that is they can also help in aggregating search data; for example, we could "tag" all data that was assigned to a certain region or cluster using this method as well: { "tags":[ "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1" "ow:cluster-kube-055b10f", "ow:trans-0555ffca456919", ... ] } of course, the array could be limited in size and downstream processors (search or otherwise) could easily "pick out" what tags they care about and discard ones they do not. On 2019/08/20 10:30:19, Chetan Mehrotra wrote: > Hi Team, > > Branching the thread [1] to discuss how to record some metadata > related to activation. Based on some of the usecases I see a need to > record some more metadata related to activation. Some examples are > > 1. transactionId - Record the transactionId for which the activation is part > of > 2. pod name - Records the pod running the action container when using > KubernetesContainerFactory > 3. invocationId - Some id returned by underlying system when > integrating with AWS Lambda or Azure Function > 4. clusterId - If running multiple clusters for same system we would > like to know which cluster handed the given execution > > Some of these ids are determined as part of `ContainerResponse` itself > and have to be made part of activation json such that later we can > correlate the activation with other parts. > > Now we need to determine how to store such id > > Option 1 - New "meta" sub document > --- > > Introduce a new "meta" key in activation json under which we store such ids > > "meta" : { > "transactionId" : "xxx", > "podId" : "ow_xxx" > } > > > Option 2 - Store them as annotations > - > > Instead of introducing a new field we store them as annotations. Note > we still make change in code to capture such data as part of > `ContainerResponse` but just map it to annotations > > One drawback of this approach is that current approach of annotations > make it harder to index such fields easily. Having a flat structure > like with "meta" field enables indexing such fields in db's other than > Couch > > Chetan Mehrotra > [1]: > https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E >
Re: Recording metadata related to activation
This part (exposing transaction id to action code) is provided via https://github.com/apache/openwhisk/pull/4586 I'm not sure what other meta may exist or planned that does not already follow this pattern, but I agree it should all be included where possible - cannot include the "duration", since that is only available after execution, but action config, like limits, may be useful to include here as well? For now, the data fields from ActivationMessage and ExecutableWhiskAction are explicitly extracted and provided to the runtime in an "environment" map - we could certainly change this to be more generic, like inferring map keys from all fields, or just sending json, but this is a bigger change to coordinate with runtimes, and gets into the question of whether /init and /run should have different signatures, I think. I think a first step is to create separate meta dictionary on Activation (option 1) without changing the API (use annotations) or runtimes. We can iterate on invoker/runtime coordination to make passing this data more consistent, and change /init /run orchestration separately as needed. Thanks Tyson On 8/21/19, 3:05 AM, "Erez Hadad" wrote: On the same note, why not also expose this "meta" information to the action code *at runtime*? The current direction this discussion is going seems to be having the "meta" information only after the action completes, in an activation record (under new key or as annotations). However, think of the following use-case: the "transaction id" can be useful for having multiple actions performing computation as part of a single transaction, and updating a DB. In such a case, the action code needs to know the transaction id so it can be passed to the DB service, marking the resulting update as part of the broader transaction. Similar cases can be made for other fields. Bottom line: I think this "meta" information needs to be more streamlined end-to-end, available to code during invocation and persisted post-factum in the activation record. Regards, -- Erez From: Dominic Kim To: dev@openwhisk.apache.org Date: 21/08/2019 02:58 Subject:[EXTERNAL] Re: Recording metadata related to activation That would be useful from the operator point of view. One question is "would that information be exposed to users"? I think the information which is exposed to users should be platform-independent. No matter which underlying platform/implementation is being used, users do and should not need to know about the internal. So that even if the operator changes their internals(K8s, native, cluster federation, ...) there should be no difference in user experience. One option can be storing them as parts of an activation for operators but exclude them when returning them in response to the user request. Though I am not sure whether this can be aligned with what you keep in your mind. Regarding the two structure options, I am inclined to use the existing structure "annotations" as it does not introduce any schema change. However, I also found it cumbersome to manipulate them in many cases. I feel it would be great to change annotations to a dictionary at some point. Since I am not aware of the history, I am curious whether there is any specific reason that annotations should be the current form. Best regards Dominic 2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성: > I mean, unless you're using these correlation ids in your business > logic, I don't see the problem of storing them in the database. My own > thoughts on using this feature would all be diagnostics-related. I'm > not running any non-trivial functions, though. > > On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra > wrote: > > > > Hi Team, > > > > Branching the thread [1] to discuss how to record some metadata > > related to activation. Based on some of the usecases I see a need to > > record some more metadata related to activation. Some examples are > > > > 1. transactionId - Record the transactionId for which the activation is > part of > > 2. pod name - Records the pod running the action container when using > > KubernetesContainerFactory > > 3. invocationId - Some id returned by underlying system when > > integrating with AWS Lambda or Azure Function > > 4. clusterId - If running multiple clusters for same system we would > > like to know which cluster handed the given execution > > > &g
Re: Recording metadata related to activation
On the same note, why not also expose this "meta" information to the action code *at runtime*? The current direction this discussion is going seems to be having the "meta" information only after the action completes, in an activation record (under new key or as annotations). However, think of the following use-case: the "transaction id" can be useful for having multiple actions performing computation as part of a single transaction, and updating a DB. In such a case, the action code needs to know the transaction id so it can be passed to the DB service, marking the resulting update as part of the broader transaction. Similar cases can be made for other fields. Bottom line: I think this "meta" information needs to be more streamlined end-to-end, available to code during invocation and persisted post-factum in the activation record. Regards, -- Erez From: Dominic Kim To: dev@openwhisk.apache.org Date: 21/08/2019 02:58 Subject: [EXTERNAL] Re: Recording metadata related to activation That would be useful from the operator point of view. One question is "would that information be exposed to users"? I think the information which is exposed to users should be platform-independent. No matter which underlying platform/implementation is being used, users do and should not need to know about the internal. So that even if the operator changes their internals(K8s, native, cluster federation, ...) there should be no difference in user experience. One option can be storing them as parts of an activation for operators but exclude them when returning them in response to the user request. Though I am not sure whether this can be aligned with what you keep in your mind. Regarding the two structure options, I am inclined to use the existing structure "annotations" as it does not introduce any schema change. However, I also found it cumbersome to manipulate them in many cases. I feel it would be great to change annotations to a dictionary at some point. Since I am not aware of the history, I am curious whether there is any specific reason that annotations should be the current form. Best regards Dominic 2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성: > I mean, unless you're using these correlation ids in your business > logic, I don't see the problem of storing them in the database. My own > thoughts on using this feature would all be diagnostics-related. I'm > not running any non-trivial functions, though. > > On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra > wrote: > > > > Hi Team, > > > > Branching the thread [1] to discuss how to record some metadata > > related to activation. Based on some of the usecases I see a need to > > record some more metadata related to activation. Some examples are > > > > 1. transactionId - Record the transactionId for which the activation is > part of > > 2. pod name - Records the pod running the action container when using > > KubernetesContainerFactory > > 3. invocationId - Some id returned by underlying system when > > integrating with AWS Lambda or Azure Function > > 4. clusterId - If running multiple clusters for same system we would > > like to know which cluster handed the given execution > > > > Some of these ids are determined as part of `ContainerResponse` itself > > and have to be made part of activation json such that later we can > > correlate the activation with other parts. > > > > Now we need to determine how to store such id > > > > Option 1 - New "meta" sub document > > --- > > > > Introduce a new "meta" key in activation json under which we store such > ids > > > > "meta" : { > > "transactionId" : "xxx", > > "podId" : "ow_xxx" > > } > > > > > > Option 2 - Store them as annotations > > - > > > > Instead of introducing a new field we store them as annotations. Note > > we still make change in code to capture such data as part of > > `ContainerResponse` but just map it to annotations > > > > One drawback of this approach is that current approach of annotations > > make it harder to index such fields easily. Having a flat structure > > like with "meta" field enables indexing such fields in db's other than > > Couch > > > > Chetan Mehrotra > > [1]: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd-40-253Cdev.openwhisk.apache.org-253E&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Oo9B0p_tCCWIIum5GpjjqA&m=45iTZS1Qpqk36WKRpGn5AVuVekJOiQscMAmVG6gb-Ao&s=TkpxifEFrs94Fzs57UTthhI93j1M6QOyBvfqkyirmXg&e= > > > > -- > Matt Sicker >
Re: Recording metadata related to activation
That would be useful from the operator point of view. One question is "would that information be exposed to users"? I think the information which is exposed to users should be platform-independent. No matter which underlying platform/implementation is being used, users do and should not need to know about the internal. So that even if the operator changes their internals(K8s, native, cluster federation, ...) there should be no difference in user experience. One option can be storing them as parts of an activation for operators but exclude them when returning them in response to the user request. Though I am not sure whether this can be aligned with what you keep in your mind. Regarding the two structure options, I am inclined to use the existing structure "annotations" as it does not introduce any schema change. However, I also found it cumbersome to manipulate them in many cases. I feel it would be great to change annotations to a dictionary at some point. Since I am not aware of the history, I am curious whether there is any specific reason that annotations should be the current form. Best regards Dominic 2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성: > I mean, unless you're using these correlation ids in your business > logic, I don't see the problem of storing them in the database. My own > thoughts on using this feature would all be diagnostics-related. I'm > not running any non-trivial functions, though. > > On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra > wrote: > > > > Hi Team, > > > > Branching the thread [1] to discuss how to record some metadata > > related to activation. Based on some of the usecases I see a need to > > record some more metadata related to activation. Some examples are > > > > 1. transactionId - Record the transactionId for which the activation is > part of > > 2. pod name - Records the pod running the action container when using > > KubernetesContainerFactory > > 3. invocationId - Some id returned by underlying system when > > integrating with AWS Lambda or Azure Function > > 4. clusterId - If running multiple clusters for same system we would > > like to know which cluster handed the given execution > > > > Some of these ids are determined as part of `ContainerResponse` itself > > and have to be made part of activation json such that later we can > > correlate the activation with other parts. > > > > Now we need to determine how to store such id > > > > Option 1 - New "meta" sub document > > --- > > > > Introduce a new "meta" key in activation json under which we store such > ids > > > > "meta" : { > > "transactionId" : "xxx", > > "podId" : "ow_xxx" > > } > > > > > > Option 2 - Store them as annotations > > - > > > > Instead of introducing a new field we store them as annotations. Note > > we still make change in code to capture such data as part of > > `ContainerResponse` but just map it to annotations > > > > One drawback of this approach is that current approach of annotations > > make it harder to index such fields easily. Having a flat structure > > like with "meta" field enables indexing such fields in db's other than > > Couch > > > > Chetan Mehrotra > > [1]: > https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E > > > > -- > Matt Sicker >
Re: Recording metadata related to activation
I mean, unless you're using these correlation ids in your business logic, I don't see the problem of storing them in the database. My own thoughts on using this feature would all be diagnostics-related. I'm not running any non-trivial functions, though. On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra wrote: > > Hi Team, > > Branching the thread [1] to discuss how to record some metadata > related to activation. Based on some of the usecases I see a need to > record some more metadata related to activation. Some examples are > > 1. transactionId - Record the transactionId for which the activation is part > of > 2. pod name - Records the pod running the action container when using > KubernetesContainerFactory > 3. invocationId - Some id returned by underlying system when > integrating with AWS Lambda or Azure Function > 4. clusterId - If running multiple clusters for same system we would > like to know which cluster handed the given execution > > Some of these ids are determined as part of `ContainerResponse` itself > and have to be made part of activation json such that later we can > correlate the activation with other parts. > > Now we need to determine how to store such id > > Option 1 - New "meta" sub document > --- > > Introduce a new "meta" key in activation json under which we store such ids > > "meta" : { > "transactionId" : "xxx", > "podId" : "ow_xxx" > } > > > Option 2 - Store them as annotations > - > > Instead of introducing a new field we store them as annotations. Note > we still make change in code to capture such data as part of > `ContainerResponse` but just map it to annotations > > One drawback of this approach is that current approach of annotations > make it harder to index such fields easily. Having a flat structure > like with "meta" field enables indexing such fields in db's other than > Couch > > Chetan Mehrotra > [1]: > https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E -- Matt Sicker
Recording metadata related to activation
Hi Team, Branching the thread [1] to discuss how to record some metadata related to activation. Based on some of the usecases I see a need to record some more metadata related to activation. Some examples are 1. transactionId - Record the transactionId for which the activation is part of 2. pod name - Records the pod running the action container when using KubernetesContainerFactory 3. invocationId - Some id returned by underlying system when integrating with AWS Lambda or Azure Function 4. clusterId - If running multiple clusters for same system we would like to know which cluster handed the given execution Some of these ids are determined as part of `ContainerResponse` itself and have to be made part of activation json such that later we can correlate the activation with other parts. Now we need to determine how to store such id Option 1 - New "meta" sub document --- Introduce a new "meta" key in activation json under which we store such ids "meta" : { "transactionId" : "xxx", "podId" : "ow_xxx" } Option 2 - Store them as annotations - Instead of introducing a new field we store them as annotations. Note we still make change in code to capture such data as part of `ContainerResponse` but just map it to annotations One drawback of this approach is that current approach of annotations make it harder to index such fields easily. Having a flat structure like with "meta" field enables indexing such fields in db's other than Couch Chetan Mehrotra [1]: https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E