Re: Recording metadata related to activation

2019-08-27 Thread Chetan Mehrotra
Thanks for all the feedback

>From @ Dominic Kim

> One option can be storing them as parts of an activation for operators but
> exclude them when returning them in response to the user request.

Ack. Some of metadata are more for diagnostic purpose and may(
should?) not be exposed to end users. So any impl need to distinguish
between public and private metadata

>From @Erez Hadad

> Bottom line: I think this "meta" information needs to be more streamlined
> end-to-end, available to code during invocation and persisted post-factum
> in the activation record.

Adding support for other such "meta" information would be on a case by
case basis. So far TransactionId is only missing meta info which we
know beforehand and hence now pass that to action. Other meta info so
far discussed are generated after the actual invocation. So later we
find any meta info which system knows beforehand then we can add
support to pass that

>From @Tyson Norris

> I think a first step is to create separate meta dictionary on Activation 
> (option 1) without changing the API (use annotations) or runtimes. We can 
> iterate on invoker/runtime coordination to make passing this data more 
> consistent, and change /init /run orchestration separately as needed.

Ack. So any proposed change should only change the internal storage
format. To end user any such meta info (those which are generic like
TransactionId) should only be exposed via annotations

>From @Matt Rutkowski

> The approach that I have seen work elsewhere I refer to as "tagging", that is 
> "tagging" data (in this case activations) with domain-specific identifiers 
> used to construct diff. views for diff. domains.

I liked this idea. However my only concern here is converting this to
an array would prevent us from being selective in what meta info we
need to index. Have meta info as dictionary would provide finer
control on which meta info operator want to index. For e.g. I may only
want to index TransactionId but not the k8s PodId. Later (podId) being
only used for some diagnostic work

Given activation db is very large I would like to minimize any
overhead in terms of indexing of meta info. One can still index all
the dict keys if needed (Both Cosmos and Couch can index all keys
under a dict if needed).

Updated Proposal
==

1. Enable the `ContainerResponse` to include a "meta" map. Any key
which starts with `_` like `_podId` would be considered private meta
key
2. Record all this meta info in the activation under "meta" key. This
can also be augmented with system considered meta key like
transactionId
3. When sending the Activation record to client
- Remove the "meta" dict
- Include all "public" meta key like `transactionId` as annotation entry

Chetan Mehrotra

On Wed, Aug 21, 2019 at 9:43 AM Matt Rutkowski  wrote:
>
> If we intend to add another top-level key to the data to make it more 
> accessible for index/search, we should do so in a manner that is extensible 
> for any number of IDs.  Index/search, as well as security and business 
> audits, require identifiers exclusively and this, in my view, is different 
> from general metadata which should be more descriptive and disposable.
>
> The approach that I have seen work elsewhere I refer to as "tagging", that is 
> "tagging" data (in this case activations) with domain-specific identifiers 
> used to construct diff. views for diff. domains.
>
> A single key is assoc. with a list of any number of these domain specific 
> identifiers each expressed as a URI where the URI components include a 
> prefix/domain that identifies the domain wherein the ID is unique (and 
> consequently how to interpret the ID), optional paths can be used to further 
> describe the ID's unique space (resource or purpose) and end with the actual 
> ID.  URIs, aside from being self-descriptive for interpretation, are 
> desirable as they intrinsically avoid collisions and also do not require a 
> key as the URI prefix/domain/path uniquely identify the domain/purpose of the 
> identifier within the same string.
>
> we could define any number if IDs that are recognized by the OW domain and 
> event create a resrved prefix to keep them short, e.g., :
>
> full: "//openwhisk.apache.org/transaction/"
> prefixed: "ow:transaction-"
>
> For example, let's say an activation handled credit card data, one could 
> "tag" the record with a PCi indicator:
>
> "//GRC20.gov/cloud/security/pci-dss/transaction/"
>
> these could appear on an optional key such as:
>
> {
>"tags":[
>   "p1://d1/id1",
>   "p2://d2/id2",
>   ...
>]
> }
>
> tags do not necessarily need to be for IDs alone... that is they can also 
> help in aggregating search data; for example, we could "tag" all data that 
> was assigned to a certain region or cluster using this method as well:
>
> {
>"tags":[
>   "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1"
>   "ow:cluster-kube-055b10f",
>   "ow:trans-0555ffca456919",
>   ...
>]

Re: Recording metadata related to activation

2019-08-21 Thread Matt Rutkowski
If we intend to add another top-level key to the data to make it more 
accessible for index/search, we should do so in a manner that is extensible for 
any number of IDs.  Index/search, as well as security and business audits, 
require identifiers exclusively and this, in my view, is different from general 
metadata which should be more descriptive and disposable.

The approach that I have seen work elsewhere I refer to as "tagging", that is 
"tagging" data (in this case activations) with domain-specific identifiers used 
to construct diff. views for diff. domains. 

A single key is assoc. with a list of any number of these domain specific 
identifiers each expressed as a URI where the URI components include a 
prefix/domain that identifies the domain wherein the ID is unique (and 
consequently how to interpret the ID), optional paths can be used to further 
describe the ID's unique space (resource or purpose) and end with the actual 
ID.  URIs, aside from being self-descriptive for interpretation, are desirable 
as they intrinsically avoid collisions and also do not require a key as the URI 
prefix/domain/path uniquely identify the domain/purpose of the identifier 
within the same string.

we could define any number if IDs that are recognized by the OW domain and 
event create a resrved prefix to keep them short, e.g., :

full: "//openwhisk.apache.org/transaction/"
prefixed: "ow:transaction-"

For example, let's say an activation handled credit card data, one could "tag" 
the record with a PCi indicator:

"//GRC20.gov/cloud/security/pci-dss/transaction/"

these could appear on an optional key such as:

{
   "tags":[
  "p1://d1/id1",
  "p2://d2/id2",
  ...
   ]
}

tags do not necessarily need to be for IDs alone... that is they can also help 
in aggregating search data; for example, we could "tag" all data that was 
assigned to a certain region or cluster using this method as well:

{
   "tags":[
  "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1"
  "ow:cluster-kube-055b10f",
  "ow:trans-0555ffca456919",
  ...
   ]
}

of course, the array could be limited in size and downstream processors (search 
or otherwise) could easily "pick out" what tags they care about and discard 
ones they do not.

On 2019/08/20 10:30:19, Chetan Mehrotra  wrote: 
> Hi Team,
> 
> Branching the thread [1] to discuss how to record some metadata
> related to activation. Based on some of the usecases I see a need to
> record some more metadata related to activation. Some examples are
> 
> 1. transactionId - Record the transactionId for which the activation is part 
> of
> 2. pod name - Records the pod running the action container when using
> KubernetesContainerFactory
> 3. invocationId - Some id returned by underlying system when
> integrating with AWS Lambda or Azure Function
> 4. clusterId - If running multiple clusters for same system we would
> like to know which cluster handed the given execution
> 
> Some of these ids are determined as part of `ContainerResponse` itself
> and have to be made part of activation json such that later we can
> correlate the activation with other parts.
> 
> Now we need to determine how to store such id
> 
> Option 1 - New "meta" sub document
> ---
> 
> Introduce a new "meta" key in activation json under which we store such ids
> 
> "meta" : {
> "transactionId" : "xxx",
> "podId" : "ow_xxx"
> }
> 
> 
> Option 2 - Store them as annotations
> -
> 
> Instead of  introducing a new field we store them as annotations. Note
> we still make change in code to capture such data as part of
> `ContainerResponse` but just map it to annotations
> 
> One drawback of this approach is that current approach of annotations
> make it harder to index such fields easily. Having a flat structure
> like with "meta" field enables indexing such fields in db's other than
> Couch
> 
> Chetan Mehrotra
> [1]: 
> https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E
> 


Re: Recording metadata related to activation

2019-08-21 Thread Tyson Norris
This part (exposing transaction id to action code) is provided via 
https://github.com/apache/openwhisk/pull/4586

I'm not sure what other meta may exist or planned that does not already follow 
this pattern, but I agree it should all be included where possible - cannot 
include the "duration", since that is only available after execution, but 
action config, like limits, may be useful to include here as well? 

For now, the data fields from ActivationMessage and ExecutableWhiskAction are 
explicitly extracted and provided to the runtime in an "environment" map - we 
could certainly change this to be more generic, like inferring map keys from 
all fields, or just sending json, but this is a bigger change to coordinate 
with runtimes, and gets into the question of whether /init and /run should have 
different signatures, I think.

I think a first step is to create separate meta dictionary on Activation 
(option 1) without changing the API (use annotations) or runtimes. We can 
iterate on invoker/runtime coordination to make passing this data more 
consistent, and change /init /run orchestration separately as needed. 

Thanks
Tyson

 
On 8/21/19, 3:05 AM, "Erez Hadad"  wrote:

On the same note, why not also expose this "meta" information to the 
action code *at runtime*? 
The current direction this discussion is going seems to be having the 
"meta" information only after the action completes, in an activation 
record (under new key or as annotations).

However, think of the following use-case: the "transaction id" can be 
useful for having multiple actions performing computation as part of a 
single transaction, and updating a DB. In such a case, the action code 
needs to know the transaction id so it can be passed to the DB service, 
marking the resulting update as part of the broader transaction. 
Similar cases can be made for other fields. 

Bottom line: I think this "meta" information needs to be more streamlined 
end-to-end, available to code during invocation and persisted post-factum 
in the activation record.

Regards,
-- Erez




From:   Dominic Kim 
To: dev@openwhisk.apache.org
Date:   21/08/2019 02:58
Subject:[EXTERNAL] Re: Recording metadata related to activation



That would be useful from the operator point of view.
One question is "would that information be exposed to users"?

I think the information which is exposed to users should be
platform-independent.
No matter which underlying platform/implementation is being used, users do
and should not need to know about the internal.
So that even if the operator changes their internals(K8s, native, cluster
federation, ...) there should be no difference in user experience.

One option can be storing them as parts of an activation for operators but
exclude them when returning them in response to the user request.
Though I am not sure whether this can be aligned with what you keep in 
your
mind.


Regarding the two structure options, I am inclined to use the existing
structure "annotations" as it does not introduce any schema change.
However, I also found it cumbersome to manipulate them in many cases.
I feel it would be great to change annotations to a dictionary at some
point.

Since I am not aware of the history, I am curious whether there is any
specific reason that annotations should be the current form.

Best regards
Dominic

2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성:

> I mean, unless you're using these correlation ids in your business
> logic, I don't see the problem of storing them in the database. My own
> thoughts on using this feature would all be diagnostics-related. I'm
> not running any non-trivial functions, though.
>
> On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra 

> wrote:
> >
> > Hi Team,
> >
> > Branching the thread [1] to discuss how to record some metadata
> > related to activation. Based on some of the usecases I see a need to
> > record some more metadata related to activation. Some examples are
> >
> > 1. transactionId - Record the transactionId for which the activation 
is
> part of
> > 2. pod name - Records the pod running the action container when using
> > KubernetesContainerFactory
> > 3. invocationId - Some id returned by underlying system when
> > integrating with AWS Lambda or Azure Function
> > 4. clusterId - If running multiple clusters for same system we would
> > like to know which cluster handed the given execution
> >
> &g

Re: Recording metadata related to activation

2019-08-21 Thread Erez Hadad
On the same note, why not also expose this "meta" information to the 
action code *at runtime*? 
The current direction this discussion is going seems to be having the 
"meta" information only after the action completes, in an activation 
record (under new key or as annotations).

However, think of the following use-case: the "transaction id" can be 
useful for having multiple actions performing computation as part of a 
single transaction, and updating a DB. In such a case, the action code 
needs to know the transaction id so it can be passed to the DB service, 
marking the resulting update as part of the broader transaction. 
Similar cases can be made for other fields. 

Bottom line: I think this "meta" information needs to be more streamlined 
end-to-end, available to code during invocation and persisted post-factum 
in the activation record.

Regards,
-- Erez




From:   Dominic Kim 
To: dev@openwhisk.apache.org
Date:   21/08/2019 02:58
Subject:    [EXTERNAL] Re: Recording metadata related to activation



That would be useful from the operator point of view.
One question is "would that information be exposed to users"?

I think the information which is exposed to users should be
platform-independent.
No matter which underlying platform/implementation is being used, users do
and should not need to know about the internal.
So that even if the operator changes their internals(K8s, native, cluster
federation, ...) there should be no difference in user experience.

One option can be storing them as parts of an activation for operators but
exclude them when returning them in response to the user request.
Though I am not sure whether this can be aligned with what you keep in 
your
mind.


Regarding the two structure options, I am inclined to use the existing
structure "annotations" as it does not introduce any schema change.
However, I also found it cumbersome to manipulate them in many cases.
I feel it would be great to change annotations to a dictionary at some
point.

Since I am not aware of the history, I am curious whether there is any
specific reason that annotations should be the current form.

Best regards
Dominic

2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성:

> I mean, unless you're using these correlation ids in your business
> logic, I don't see the problem of storing them in the database. My own
> thoughts on using this feature would all be diagnostics-related. I'm
> not running any non-trivial functions, though.
>
> On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra 

> wrote:
> >
> > Hi Team,
> >
> > Branching the thread [1] to discuss how to record some metadata
> > related to activation. Based on some of the usecases I see a need to
> > record some more metadata related to activation. Some examples are
> >
> > 1. transactionId - Record the transactionId for which the activation 
is
> part of
> > 2. pod name - Records the pod running the action container when using
> > KubernetesContainerFactory
> > 3. invocationId - Some id returned by underlying system when
> > integrating with AWS Lambda or Azure Function
> > 4. clusterId - If running multiple clusters for same system we would
> > like to know which cluster handed the given execution
> >
> > Some of these ids are determined as part of `ContainerResponse` itself
> > and have to be made part of activation json such that later we can
> > correlate the activation with other parts.
> >
> > Now we need to determine how to store such id
> >
> > Option 1 - New "meta" sub document
> > ---
> >
> > Introduce a new "meta" key in activation json under which we store 
such
> ids
> >
> > "meta" : {
> > "transactionId" : "xxx",
> > "podId" : "ow_xxx"
> > }
> >
> >
> > Option 2 - Store them as annotations
> > -
> >
> > Instead of  introducing a new field we store them as annotations. Note
> > we still make change in code to capture such data as part of
> > `ContainerResponse` but just map it to annotations
> >
> > One drawback of this approach is that current approach of annotations
> > make it harder to index such fields easily. Having a flat structure
> > like with "meta" field enables indexing such fields in db's other than
> > Couch
> >
> > Chetan Mehrotra
> > [1]:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd-40-253Cdev.openwhisk.apache.org-253E&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Oo9B0p_tCCWIIum5GpjjqA&m=45iTZS1Qpqk36WKRpGn5AVuVekJOiQscMAmVG6gb-Ao&s=TkpxifEFrs94Fzs57UTthhI93j1M6QOyBvfqkyirmXg&e=
 

>
>
>
> --
> Matt Sicker 
>





Re: Recording metadata related to activation

2019-08-20 Thread Dominic Kim
That would be useful from the operator point of view.
One question is "would that information be exposed to users"?

I think the information which is exposed to users should be
platform-independent.
No matter which underlying platform/implementation is being used, users do
and should not need to know about the internal.
So that even if the operator changes their internals(K8s, native, cluster
federation, ...) there should be no difference in user experience.

One option can be storing them as parts of an activation for operators but
exclude them when returning them in response to the user request.
Though I am not sure whether this can be aligned with what you keep in your
mind.


Regarding the two structure options, I am inclined to use the existing
structure "annotations" as it does not introduce any schema change.
However, I also found it cumbersome to manipulate them in many cases.
I feel it would be great to change annotations to a dictionary at some
point.

Since I am not aware of the history, I am curious whether there is any
specific reason that annotations should be the current form.

Best regards
Dominic

2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성:

> I mean, unless you're using these correlation ids in your business
> logic, I don't see the problem of storing them in the database. My own
> thoughts on using this feature would all be diagnostics-related. I'm
> not running any non-trivial functions, though.
>
> On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra 
> wrote:
> >
> > Hi Team,
> >
> > Branching the thread [1] to discuss how to record some metadata
> > related to activation. Based on some of the usecases I see a need to
> > record some more metadata related to activation. Some examples are
> >
> > 1. transactionId - Record the transactionId for which the activation is
> part of
> > 2. pod name - Records the pod running the action container when using
> > KubernetesContainerFactory
> > 3. invocationId - Some id returned by underlying system when
> > integrating with AWS Lambda or Azure Function
> > 4. clusterId - If running multiple clusters for same system we would
> > like to know which cluster handed the given execution
> >
> > Some of these ids are determined as part of `ContainerResponse` itself
> > and have to be made part of activation json such that later we can
> > correlate the activation with other parts.
> >
> > Now we need to determine how to store such id
> >
> > Option 1 - New "meta" sub document
> > ---
> >
> > Introduce a new "meta" key in activation json under which we store such
> ids
> >
> > "meta" : {
> > "transactionId" : "xxx",
> > "podId" : "ow_xxx"
> > }
> >
> >
> > Option 2 - Store them as annotations
> > -
> >
> > Instead of  introducing a new field we store them as annotations. Note
> > we still make change in code to capture such data as part of
> > `ContainerResponse` but just map it to annotations
> >
> > One drawback of this approach is that current approach of annotations
> > make it harder to index such fields easily. Having a flat structure
> > like with "meta" field enables indexing such fields in db's other than
> > Couch
> >
> > Chetan Mehrotra
> > [1]:
> https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E
>
>
>
> --
> Matt Sicker 
>


Re: Recording metadata related to activation

2019-08-20 Thread Matt Sicker
I mean, unless you're using these correlation ids in your business
logic, I don't see the problem of storing them in the database. My own
thoughts on using this feature would all be diagnostics-related. I'm
not running any non-trivial functions, though.

On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra  wrote:
>
> Hi Team,
>
> Branching the thread [1] to discuss how to record some metadata
> related to activation. Based on some of the usecases I see a need to
> record some more metadata related to activation. Some examples are
>
> 1. transactionId - Record the transactionId for which the activation is part 
> of
> 2. pod name - Records the pod running the action container when using
> KubernetesContainerFactory
> 3. invocationId - Some id returned by underlying system when
> integrating with AWS Lambda or Azure Function
> 4. clusterId - If running multiple clusters for same system we would
> like to know which cluster handed the given execution
>
> Some of these ids are determined as part of `ContainerResponse` itself
> and have to be made part of activation json such that later we can
> correlate the activation with other parts.
>
> Now we need to determine how to store such id
>
> Option 1 - New "meta" sub document
> ---
>
> Introduce a new "meta" key in activation json under which we store such ids
>
> "meta" : {
> "transactionId" : "xxx",
> "podId" : "ow_xxx"
> }
>
>
> Option 2 - Store them as annotations
> -
>
> Instead of  introducing a new field we store them as annotations. Note
> we still make change in code to capture such data as part of
> `ContainerResponse` but just map it to annotations
>
> One drawback of this approach is that current approach of annotations
> make it harder to index such fields easily. Having a flat structure
> like with "meta" field enables indexing such fields in db's other than
> Couch
>
> Chetan Mehrotra
> [1]: 
> https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E



-- 
Matt Sicker 


Recording metadata related to activation

2019-08-20 Thread Chetan Mehrotra
Hi Team,

Branching the thread [1] to discuss how to record some metadata
related to activation. Based on some of the usecases I see a need to
record some more metadata related to activation. Some examples are

1. transactionId - Record the transactionId for which the activation is part of
2. pod name - Records the pod running the action container when using
KubernetesContainerFactory
3. invocationId - Some id returned by underlying system when
integrating with AWS Lambda or Azure Function
4. clusterId - If running multiple clusters for same system we would
like to know which cluster handed the given execution

Some of these ids are determined as part of `ContainerResponse` itself
and have to be made part of activation json such that later we can
correlate the activation with other parts.

Now we need to determine how to store such id

Option 1 - New "meta" sub document
---

Introduce a new "meta" key in activation json under which we store such ids

"meta" : {
"transactionId" : "xxx",
"podId" : "ow_xxx"
}


Option 2 - Store them as annotations
-

Instead of  introducing a new field we store them as annotations. Note
we still make change in code to capture such data as part of
`ContainerResponse` but just map it to annotations

One drawback of this approach is that current approach of annotations
make it harder to index such fields easily. Having a flat structure
like with "meta" field enables indexing such fields in db's other than
Couch

Chetan Mehrotra
[1]: 
https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E