Re: New OpenWhisk PMC Members: Brendan Doyle and Cosmin Stanciu

2022-02-23 Thread Tyson Norris
Congratulations Cosmin and Brendan!!!

Best,
Tyson

On Wed, Feb 23, 2022 at 6:57 AM Rodric Rabbah  wrote:

> Congratulations Brendan and Cosmin!
>
> -r
>
> On Wed, Feb 23, 2022 at 6:30 AM Matt Rutkowski 
> wrote:
>
> > Welcome and congratulations Brendan and Cosmin!
> >
> > Kind regards,
> > Matt
> >
>


Re: [VOTE] Release Apache OpenWhisk Client Js (v3.21.6, rc1)

2022-01-04 Thread Tyson Norris
+1 to release Apache OpenWhisk Client Js (v3.21.6, rc1)
Thanks
Tyson

From: Rob Allen 
Date: Friday, December 31, 2021 at 3:10 AM
To: dev@openwhisk.apache.org 
Subject: Re: [VOTE] Release Apache OpenWhisk Client Js (v3.21.6, rc1)
+1 to release Apache OpenWhisk Client Js (v3.21.6, rc1)

Checked with rcverify.sh (script SHA1: 7FC5 5DBE 1809 6D92 DEFF  0E31 D138 059B 
8F27 20F7)

Regards,

Rob

> On 31 Dec 2021, at 05:53, OpenWhisk Release  wrote:
>
> Hi,
>
> This is a call to vote on releasing version 3.21.6 release candidate rc1 of 
> the following project module with artifacts built from the Git repositories 
> and commit IDs listed below.
>
> * OpenWhisk Client Js: 1aba396e8a59afd5a90acb8157f2009746d7a714
>  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-client-js%2Fcommit%2F1aba396e8a59afd5a90acb8157f2009746d7a714data=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=bYAeyOKSKOxjDYQw8wJxWYhwov9cIAofq1qshq1Rf8E%3Dreserved=0
>  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.6-sources.tar.gzdata=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=VmZL752MPyNpmAWc7ayx0lcwQJE46qK2g7SLZ85QWqc%3Dreserved=0
>  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.6-sources.tar.gz.ascdata=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=%2BXrq0%2BTzqYJRU8XHUKC8J%2F353vdJYUKgOANekoszodI%3Dreserved=0
>  
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.6-sources.tar.gz.sha512data=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=tDBEoqrksC8%2BjaB6hJRpa4sWOuP%2BhyqsAK1Hs%2FvMmoc%3Dreserved=0
>
> This release is comprised of source code distribution only.
>
> You can use this UNIX script to download the release and verify the checklist 
> below:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3Dba8a21fdata=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=hzqGzcX2n9okpgo%2F5lI1hiii7%2BqV2hfb0TS8XT%2FJ7zs%3Dreserved=0
>
> Usage:
> curl -s 
> "https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3Dba8a21fdata=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=hzqGzcX2n9okpgo%2F5lI1hiii7%2BqV2hfb0TS8XT%2FJ7zs%3Dreserved=0;
>  -o rcverify.sh
> chmod +x rcverify.sh
> ./rcverify.sh openwhisk-client-js 3.21.6 rc1
>
> Please vote to approve this release:
>
>  [ ] +1 Approve the release
>  [ ]  0 Don't care
>  [ ] -1 Don't release, because ...
>
> Release verification checklist for reference:
>  [ ] Download links are valid.
>  [ ] Checksums and PGP signatures are valid.
>  [ ] Source code artifacts have correct names matching the current release.
>  [ ] LICENSE and NOTICE files are correct for each OpenWhisk repository.
>  [ ] All files have license headers as specified by OpenWhisk project policy 
> [1].
>  [ ] No compiled archives bundled in source archive.
>
> This majority vote is open for at least 72 hours.
>
>
> [1] 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-release%2Fblob%2Fmaster%2Fdocs%2Flicense_compliance.mddata=04%7C01%7Ctnorris%40adobe.com%7Cf0b6fa3881694b96038c08d9cc4daace%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637765458007312238%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000sdata=dFkDIhDv7u4LCYCcdsgX7ics6wUgz4kkdZGLTEV7QTY%3Dreserved=0


Re: [VOTE] Release Apache OpenWhisk Client Js (v3.21.5, rc1)

2021-11-03 Thread Tyson Norris
+1 for the release of openwhisk-client-js 3.21.5 rc1

Thanks Cosmin!

On Wed, Nov 3, 2021 at 6:52 AM Matt Rutkowski  wrote:

> +1 for the release of openwhisk-client-js 3.21.5 rc1
>
> -Matt
>
> Verified locally:
> $ ./rcverify.sh openwhisk-client-js 3.21.5 rc1
> rcverify.sh (script SHA1: 7FC5 5DBE 1809 6D92 DEFF  0E31 D138 059B 8F27
> 20F7)
> working in the following directory:
>
> /var/folders/bc/p62kjnm12n9_l30qfxv7p9lcgn/T/tmp.Mgdl0xz4
> fetching tarball and signatures from
> https://dist.apache.org/repos/dist/dev/openwhisk/rc1
> fetching openwhisk-client-js-3.21.5-sources.tar.gz... ok
> fetching openwhisk-client-js-3.21.5-sources.tar.gz.asc... ok
> fetching openwhisk-client-js-3.21.5-sources.tar.gz.sha512... ok
> fetching apache license... ok
> fetching release keys... ok
> importing keys... ok (new keys imported (processed 13 unchanged 11))
> gpg: key 72AF0CC22C4CF320: "Vincent Hou (Release manager of OpenWhisk) <
> houshen...@apache.org>" not changed
> gpg: key 22907064147F886E: "Dave Grove " not changed
> gpg: key 44667BC927C86D51: "Rodric Rabbah " not changed
> gpg: key B1457C3D7101CC78: "James Thomas " not
> changed
> gpg: key A600E3331427515D: "Olivier Tardieu " not
> changed
> gpg: key 804F627B2D1BD1A0: 6 signatures not checked due to missing keys
> gpg: key 804F627B2D1BD1A0: "Alexander Klimetschek (Adobe Work Email) <
> aklim...@adobe.com>" not changed
> gpg: key 758C332F8D30E5A2: 1 signature not checked due to a missing key
> gpg: key 758C332F8D30E5A2: "Alexander Klimetschek (Apache Committer Key) <
> alex...@apache.org>" not changed
> gpg: key B5B8ADA933BB2FFF: "Dominic Kim " not changed
> gpg: key A1F071AF3F62EEFF: 3 signatures not checked due to missing keys
> gpg: key A1F071AF3F62EEFF: "keybase.io/akrabat " not
> changed
> gpg: key 395282A61D88D0AC: "Matt Rutkowski " not
> changed
> gpg: key 7050DAD4D8D21A6B: "Shawn Black " not
> changed
> gpg: key 1683F2D3AF54F2F1: public key "Tyson Norris <
> tysonnor...@apache.org>" imported
> gpg: key 44FA19E603F812E5: public key "Cosmin Stanciu (CODE SIGNING KEY) <
> stan...@apache.org>" imported
> gpg: Total number processed: 13
> gpg:   imported: 2
> gpg:  unchanged: 11
> unpacking tar ball... ok
> cloning scancode... ok
> computing sha512 for openwhisk-client-js-3.21.5-sources.tar.gz... ok
> openwhisk-client-js-3.21.5-sources.tar.gz:
> 06B7D8FB 6F5DC5CC A3BDFB08 6952FD10 8478EC3D 3A16B2C9 200D0157 0A3D2548
> 01DD9030
>  6FDDA0E7 EEF39D8E 594504AD 069AFD40 C85F3F09 CB5E4809 153F6D2B
> validating sha512... passed
> verifying asc... passed (signed-by: Cosmin Stanciu (CODE SIGNING KEY) <
> stan...@apache.org>)
> verifying notice... passed
> verifying absence of DISCLAIMER.txt passed
> verifying license... passed
> verifying sources have proper headers... passed
> scanning for executable files... passed
> scanning for unexpected file types... passed
> scanning for archives... passed
> scanning for packages... passed
> scanning package.json for version match... passed
> scanning package-lock.json for version match... passed
> removing the scratch space
> (/var/folders/bc/p62kjnm12n9_l30qfxv7p9lcgn/T/tmp.Mgdl0xz4)... ok
>


Re: [DISCUSS} OpenWhisk 21.05 / OpenWhisk 1.1

2021-06-10 Thread Tyson Norris
I would suggest removing MesosContainerFactory and dependencies on mesos-actor. 
We are not using it, and I’m sure nobody else is either.
If you already have changed to accommodate this, feel free to add it, otherwise 
I can create a PR just for this.

Thanks
Tyson

From: Rodric Rabbah 
Date: Thursday, June 10, 2021 at 8:14 AM
To: dev@openwhisk.apache.org 
Subject: Re: [DISCUSS} OpenWhisk 21.05 / OpenWhisk 1.1
A number of these dependencies are coming from
com.adobe.api.platform.runtime:mesos-actor:0.0.17. I tried a more recent
version 0.0.25 (still from 2019 though) and that does not resolve the
dependency issue.

Is there still a need for the mesos actor library?
I am tracking down one other conflicting dependency.

-r

On Thu, Jun 10, 2021 at 9:31 AM David P Grove  wrote:

>
>
> "David P Grove"  wrote on 06/04/2021 08:33:37 PM:
> >
> > Well, May went by fast :)
> >
> > Is a core OpenWhisk release in June feasible?   I know there has been a
> > fair amount of activity, but I don't have a good sense of whether it is
> > converging on a good point to release or not.
> >
> > --dave
>
> The upgrade in akka versions in the core repository broke all the
> downstream repos that consume the test suite from the core.
>
> Therefore, we can't do a unified release unless we either:
>(a) branch the core repo prior to this change
> or (b) fix all of downstream repos and then do a release wave.
>
> That combined with Dominic's comments on the status of the scheduler
> changes suggests we are probably looking at a 21.08 in August
> as the earliest possible unified release.
>
> --dave
>


Re: [discuss]take prewarmed container's memory as used memory

2020-05-26 Thread Tyson Norris
I agree this is a good change, but may require operators to update their 
configured userMemory and/or their prewarm configs to avoid unexpected 
problems. It might be good to add some log statements to indicate userMemory 
available AFTER initial prewarm config is applied, to give some hints that 
userMemory seen at invoker is now lower than before with the same configs?

Thanks
Tyson

On 5/24/20, 11:35 PM, "甯尤刚"  wrote:

Hi, guys,

  Due to this pr (Adjust prewarm container dynamically): 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4871data=02%7C01%7Ctnorris%40adobe.com%7C31683d4623b34a5f4d4d08d80075cc1c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637259853220792009sdata=NFD8P3scFb1vUvK2qNEQicbViBHwTl4j%2BuusM1liUkw%3Dreserved=0
 is merged
I think it is a good chance to add this feature: take prewarmed container's 
memory as used memory: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4911data=02%7C01%7Ctnorris%40adobe.com%7C31683d4623b34a5f4d4d08d80075cc1c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637259853220792009sdata=%2F%2FUk4HGeYJAyt7QdI9u7%2FD%2B3uRPe%2FcUa6Jxw5E%2FO3Z0%3Dreserved=0

for master branch,  if invoker has `16GB` physical memory, currently, if we 
configure 12GB user memory and 5GB prewarm pool, invokers will try to create 
containers up to 17GB memory and it may will end up with OOM.

After apply this feature, user just configure invoker memory to a precise 
value:  `16GB` , when  `used memory` reaches `16GB`, it will not create 
container any more, which mean, `16GB` is the max memory usuage for containers, 
this can avoid `OOM`

This has another benefit that make user must calculate the prewarmed 
container's memory carefully, and configure the invoker memory more accurately
```
"CONFIG_whisk_containerPool_userMemory": "{{ 
hostvars[groups['invokers'][invoker_index | int]].user_memory | 
default(invoker.userMemory) }}"
```

Have any idea or suggestion for this pr: take prewarmed container's memory 
as used memory: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4911data=02%7C01%7Ctnorris%40adobe.com%7C31683d4623b34a5f4d4d08d80075cc1c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637259853220792009sdata=%2F%2FUk4HGeYJAyt7QdI9u7%2FD%2B3uRPe%2FcUa6Jxw5E%2FO3Z0%3Dreserved=0
 ?



Re: [FeedBack]Adjust prewarm container dynamically

2020-05-19 Thread Tyson Norris
Thank Ning!
I have approved the PR - if nobody else has comments or changes requested by 
tomorrow, I will merge it if nobody else pushes the button first.
Tyson

On 5/18/20, 8:34 PM, "Rodric Rabbah"  wrote:

I've watched this PR as it has evolved - very nice work and contribution
(and thanks Tyson also for all the detailed guidance you provided).

-r

On Mon, May 18, 2020 at 11:28 PM 甯尤刚  wrote:

> Hi, guys
> this is the feature: Adjust prewarm container dynamically:
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4871data=02%7C01%7Ctnorris%40adobe.com%7C51b99e290ba34a76a90708d7fba5796d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637254560437768829sdata=VzDVaNVDFyXXHRoJu%2BLFJrltjCsfbeMt%2Fqgtzrmvn2g%3Dreserved=0
> Currently, this pr is close to done.
> I look forward to receiving your guys's feedback, thanks!
>



Re: [VOTE] Release Apache OpenWhisk Client Js (v3.21.2, rc1)

2020-05-05 Thread Tyson Norris
+1 to release OW client js v3.21.2, rc1

Please vote to approve this release:

  [x] +1 Approve the release
  [ ]  0 Don't care
  [ ] -1 Don't release, because ...

Release verification checklist for reference:
  [x] Download links are valid.
  [x] Checksums and PGP signatures are valid.
  [x] Source code artifacts have correct names matching the current release.
  [x] LICENSE and NOTICE files are correct for each OpenWhisk repository.
  [x] All files have license headers as specified by OpenWhisk project
policy [1].
  [x] No compiled archives bundled in source archive.

On 5/4/20, 6:00 PM, "Rodric Rabbah"  wrote:

Hi,

This is a call to vote on releasing version 3.21.2 release candidate rc1 of
the following project module with artifacts built from the Git repositories
and commit IDs listed below.

* OpenWhisk Client Js: eaa43743648c4ff69f53af95befc9bd314178d57


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-client-js%2Fcommits%2Feaa43743648c4ff69f53af95befc9bd314178d57data=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100704811sdata=X9RRg404rrNbuhbZoB3Kr5%2B9hzDYESVO5whaqam9a40%3Dreserved=0


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.2-sources.tar.gzdata=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100704811sdata=yp5CuTTIxGsV7Sn%2B376gviSam7cd72Y1b0SjcD0aywo%3Dreserved=0


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.2-sources.tar.gz.ascdata=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100704811sdata=LmKNCBWYEwIYnIQEaFqc3sH7Jbqx1xLlyLYC4%2BZAdkA%3Dreserved=0


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-client-js-3.21.2-sources.tar.gz.sha512data=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100704811sdata=3POzx6u3crQo8wEGOmtVL68eiO%2F4HzQYXlKQ0ogDfB0%3Dreserved=0

This release is comprised of source code distribution only.

You can use this UNIX script to download the release and verify the
checklist below:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3D56445f1data=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100704811sdata=zjFCtODNkCbhiUiA5xFRw2%2FLyZUeLZQXdzaje1eWWSY%3Dreserved=0

Usage:
curl -s "

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3D56445f1data=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100714799sdata=EUPIEvHOAUcL%2BUubANGUmL4QWjybfVqh2YektI4gadU%3Dreserved=0;
-o rcverify.sh
chmod +x rcverify.sh
rcverify.sh openwhisk-client-js 'OpenWhisk Client Js' 3.21.2 rc1

Please vote to approve this release:

  [ ] +1 Approve the release
  [ ]  0 Don't care
  [ ] -1 Don't release, because ...

Release verification checklist for reference:
  [ ] Download links are valid.
  [ ] Checksums and PGP signatures are valid.
  [ ] Source code artifacts have correct names matching the current release.
  [ ] LICENSE and NOTICE files are correct for each OpenWhisk repository.
  [ ] All files have license headers as specified by OpenWhisk project
policy [1].
  [ ] No compiled archives bundled in source archive.

This majority vote is open for at least 72 hours.


[1]

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-release%2Fblob%2Fmaster%2Fdocs%2Flicense_compliance.mddata=02%7C01%7Ctnorris%40adobe.com%7C3caf67512f4244d8312d08d7f08fa7ba%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637242372100714799sdata=WBFpzZV%2Fpag2NhGwhGsEf0es%2BSbojZAif6ABMDtdS6o%3Dreserved=0



Re: Transaction ID in js client

2020-04-10 Thread Tyson Norris
Yes, sequence gets same transaction id for all activations. If you had a 
sequence that uses the JS SDK to launch other activations, now they too would 
also get the same transaction id.
This all sounds more useful than not, so unless there are objections, I'll 
merge the PR as is on Monday.

Thanks
Tyson

On 4/9/20, 8:43 PM, "Alexander Klimetschek"  wrote:

IIRC the __OW_TRANSACTION_ID is currently set for sequence (each sequence = 
one transaction)? Could there by any negative consequences if this is 
overwritten?

(just thinking out loud, I probably don't know all the details)

Cheers,
Alex

From: Tyson Norris 
Sent: Monday, April 6, 2020 07:39
To: dev@openwhisk.apache.org 
Subject: Re: Transaction ID in js client

The current impl is that any request invoked via `this.client.request()` 
will get the x-request-id header set with the value from __OW_TRANSACTION_ID 
env var.

On 4/6/20, 7:32 AM, "Rodric Rabbah"  wrote:

I don't understand the PR - the amended headers are propagated to what 
HTTP
request?

-r

On Mon, Apr 6, 2020 at 10:23 AM Tyson Norris 
wrote:

> Hi –
> One of our customers wants to reuse the transaction id when a js 
action
> uses the openwhisk js client to invoke another action.
> This sounds reasonable to me, but I’m not sure if there is some 
argument
> to keep them as separate transaction ids?
>
> There is a PR already here
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-client-js%2Fpull%2F208data=02%7C01%7Ctnorris%40adobe.com%7C802f2d1f6d74493b5d5608d7dd014b36%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637220869982570099sdata=JK5hQB1Gy6OHmuH70wLQ5nDSmLy0Fz84qFo1z6SyM0g%3Dreserved=0
 that I’m inclined
> to merge, but in case there is some reason to keep the transaction ids
> separate, will ask them to update the PR to make an it opt-in feature.
>
> Thanks
> Tyson
>
>






Re: Transaction ID in js client

2020-04-06 Thread Tyson Norris
The current impl is that any request invoked via `this.client.request()` will 
get the x-request-id header set with the value from __OW_TRANSACTION_ID env var.

On 4/6/20, 7:32 AM, "Rodric Rabbah"  wrote:

I don't understand the PR - the amended headers are propagated to what HTTP
request?

-r

On Mon, Apr 6, 2020 at 10:23 AM Tyson Norris 
wrote:

> Hi –
> One of our customers wants to reuse the transaction id when a js action
> uses the openwhisk js client to invoke another action.
> This sounds reasonable to me, but I’m not sure if there is some argument
> to keep them as separate transaction ids?
>
> There is a PR already here
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-client-js%2Fpull%2F208data=02%7C01%7Ctnorris%40adobe.com%7Ce1d523df2a0649256ff808d7da374a9e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637217803357043621sdata=5JLqVfX%2BPpcBTTuZtim%2FxZM25vDgYFqSvu7he5IzR9g%3Dreserved=0
 that I’m inclined
> to merge, but in case there is some reason to keep the transaction ids
> separate, will ask them to update the PR to make an it opt-in feature.
>
> Thanks
> Tyson
>
>




Transaction ID in js client

2020-04-06 Thread Tyson Norris
Hi –
One of our customers wants to reuse the transaction id when a js action uses 
the openwhisk js client to invoke another action.
This sounds reasonable to me, but I’m not sure if there is some argument to 
keep them as separate transaction ids?

There is a PR already here 
https://github.com/apache/openwhisk-client-js/pull/208 that I’m inclined to 
merge, but in case there is some reason to keep the transaction ids separate, 
will ask them to update the PR to make an it opt-in feature.

Thanks
Tyson



Re: Welcome new committer: Neeraj Mangal

2020-03-13 Thread Tyson Norris
Welcome Neeraj! Congratulations and thanks for all your help! 
Tyson

> On Mar 12, 2020, at 9:13 PM, Dragos Dascalita Haut 
>  wrote:
> 
> The Project Management Committee (PMC) for Apache OpenWhisk
> has invited Neeraj to become a committer and we are pleased
> to announce that he has accepted.
> 
> Please join me in welcoming Neeraj to his new role on the project !
> 
> Neeraj has been contributing to the Kube deployment, the CLI, and the main 
> project.
> 


Re: Welcome new committer Alex Klimetschek

2020-03-13 Thread Tyson Norris
Welcome Alex!
Your support for both Adobe Runtime and OpenWhisk has been awesome!
Tyson

> On Mar 13, 2020, at 3:35 AM, Bertrand Delacretaz  
> wrote:
> 
> Welcome Alex!
> 
>> On Fri, Mar 13, 2020 at 4:40 AM Alexander Klimetschek
>>  wrote:
>> ...In not so small parts because it was a fun thing to build - see more at 
>> [4]! And this is how I ended up here :-)
> 
> I was trying to remember when we first "met" around Apache and
> lists.apache.org helped me find these messages around Apache Cocoon
> back in 2006...those were fun times already and thanks for keeping the
> fun factor in your work!
> 
> -Bertrand


Re: Welcome new committer: Dan McWeeny

2020-02-25 Thread Tyson Norris
Woohoo! Welcome Dan!

On 2/25/20, 11:33 AM, "Dragos Dascalita Haut"  
wrote:

It is my pleasure to share that the OpenWhisk PPMC
has elected Dan McWeeny as a Committer,
based on his ongoing and valuable contributions to the project,
the most recent ones being around moving to Java 11, and encrypting default 
params.

Dan has accepted the invitation.

Please join me in welcoming Dan to his new role on the project !

Being a committer enables easier contribution to the project
since there is no need to go via the patch submission process.
This should enable better productivity.
Being a PMC member enables assistance with the management
and to guide the direction of the project.


dragos





Re: Config runtime

2020-01-09 Thread Tyson Norris
Hi - 
I would rather see this as an automated + configurable feature, rather than an 
API that is manually invoked with a user making (possibly bad) decisions.
I created an issue to describe this here 
https://github.com/apache/openwhisk/issues/4725

Part of the reason for automating this, is that in addition to having a deficit 
of prewarms, we also experience problems related to having a surplus of 
prewarms, and in case of blackbox dedicated invokers, this can be a problem 
where those prewarms are never used, and just waste resources. If there was an 
automated way to scale down the unused prewarms, this resource usage would be 
temporary.

Thanks
Tyson

On 1/6/20, 7:29 PM, "甯尤刚"  wrote:

Hello Everyone:
​
I submited a WIP patch: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4790data=02%7C01%7Ctnorris%40adobe.com%7Cea533f3198ec4ae74afb08d79321db99%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637139645952054275sdata=X1FxwSYDXGIMBz8oltz0uUHz9mEXPSabek9sALTnloc%3Dreserved=0
 here
​
Sometimes, admin may want to reinitalize the runtime config, e.g. nodejs:10 
prewarm container number is less, lead to cold start, in order to handle user's 
request as soon as possible,
admin may want to reinitalize the runtime configuration to increase the 
nodejs:10 prewarm containers.
And admin may want to reinitalize the runtime config on some limited 
invokers
​
Currently, just finish its basic functions and worked well in my local, but 
didn't add test cases then.
If the direction of this patch is ok, i will finish its corresponding test 
cases
​
Best Regards,
ning.yougang
​
​




Re: [VOTE] Release Apache OpenWhisk Runtime Dotnet (v1.14.0, rc1)

2020-01-02 Thread Tyson Norris
+1 to release  Apache OpenWhisk Runtime Dotnet (v1.14.0, rc1)
All tests passed via rcverify.sh
Thanks
Tyson

On 12/29/19, 3:18 PM, "Shawn Black"  wrote:

+1

computing sha512 for openwhisk-runtime-dotnet-1.14.0-sources.tar.gz
SHA512: openwhisk-runtime-dotnet-1.14.0-sources.tar.gz: 
2359D1BB 46C54D1A 5C637805 8D40E8D0 107F07B7 96C14B0C F9DE4CE8 B1A812DE 
DB9ACD86
 1B37E72F 332AE5A2 00F23F63 63B931AF 07E96E7B 02288F8C A9EA9BD0
validating sha512... passed
verifying asc... passed (signed-by: Dave Grove )
verifying notice... passed
verifying absence of DISCLAIMER.txt passed
verifying license... passed
verifying sources have proper headers... passed
scanning for executable files... passed
scanning for unexpected file types... passed
scanning for archives... passed
scanning for packages... passed


On 2019/12/27 19:55:51, "David P Grove"  wrote: 
> 
> 
> Hi,
> 
> This is a call to vote on releasing version 1.14.0 release candidate rc1 
of
> the following project module with artifacts built from the Git 
repositories
> and commit IDs listed below.
> 
> * OpenWhisk Runtime Dotnet: a9b70ca4b194bb21691da0f8a4131a765ea6ab19
> 
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-runtime-dotnet%2Fcommits%2Fa9b70ca4b194bb21691da0f8a4131a765ea6ab19data=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277942927sdata=2%2BHO7DUGiYMrEUsFMNRPFHdwthyQ90PVB2lv7U7gFu8%3Dreserved=0
> 
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-runtime-dotnet-1.14.0-sources.tar.gzdata=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277942927sdata=U1%2BgrMut3d182U6OWC0Dg35bBHKU7HL82NEpkoRPAZg%3Dreserved=0
> 
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-runtime-dotnet-1.14.0-sources.tar.gz.ascdata=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277942927sdata=HJoMMoDgwubKhTP3qim4OHZqZ1vmdMP%2FqtVPmCbu2YY%3Dreserved=0
> 
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fopenwhisk%2Frc1%2Fopenwhisk-runtime-dotnet-1.14.0-sources.tar.gz.sha512data=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277952922sdata=tNZD45r3mRa9pvchxRGoUVZXXF3PYXO6eqglLrY6O%2Fo%3Dreserved=0
> 
> This release is comprised of source code distribution only.
> 
> You can use this UNIX script to download the release and verify the
> checklist below:
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3D25a8d86data=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277952922sdata=wg8ZXhSlcZ4nmI0PazP8EP7ZplaU8vTnJdTSao6%2B6Xc%3Dreserved=0
> 
> Usage:
> curl -s "
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitbox.apache.org%2Frepos%2Fasf%3Fp%3Dopenwhisk-release.git%3Ba%3Dblob_plain%3Bf%3Dtools%2Frcverify.sh%3Bhb%3D25a8d86data=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277952922sdata=wg8ZXhSlcZ4nmI0PazP8EP7ZplaU8vTnJdTSao6%2B6Xc%3Dreserved=0
> " -o rcverify.sh
> chmod +x rcverify.sh
> rcverify.sh openwhisk-runtime-dotnet 'OpenWhisk Runtime Dotnet' 1.14.0 rc1
> 
> Please vote to approve this release:
> 
>   [ ] +1 Approve the release
>   [ ]  0 Don't care
>   [ ] -1 Don't release, because ...
> 
> Release verification checklist for reference:
>   [ ] Download links are valid.
>   [ ] Checksums and PGP signatures are valid.
>   [ ] Source code artifacts have correct names matching the current
> release.
>   [ ] LICENSE and NOTICE files are correct for each OpenWhisk repository.
>   [ ] All files have license headers as specified by OpenWhisk project
> policy [1].
>   [ ] No compiled archives bundled in source archive.
> 
> This majority vote is open for at least 72 hours.
> 
> 
> [1]
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk-release%2Fblob%2Fmaster%2Fdocs%2Flicense_compliance.mddata=02%7C01%7Ctnorris%40adobe.com%7C9961f65bfb154775d64208d78cb57415%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637132583277952922sdata=4N5I2WQ%2B0mqK1YC2zsuzK6sQr5R4TghN3O8jbv2URCk%3Dreserved=0
> 




Re: Welcome new committer: Cosmin Stanciu

2019-12-16 Thread Tyson Norris
WOO WOO! Welcome and congratulations Cosmin!!! Thanks for all your work!

On 12/16/19, 1:47 PM, "Dragos Dascalita Haut"  
wrote:

It is my pleasure to share that the OpenWhisk PPMC
has elected Cosmin Stanciu as a Committer,
based on his ongoing and valuable contributions to the project,
especially around user events, monitoring and kubernetes deployment.

Cosmin has accepted the invitation.

Please join me in welcoming Cosmin to his new role on the project !

Being a committer enables easier contribution to the project
since there is no need to go via the patch submission process.
This should enable better productivity.
Being a PMC member enables assistance with the management
and to guide the direction of the project.


dragos




Re: Welcome new Committer Pengcheng Jiang

2019-12-05 Thread Tyson Norris
Welcome Pengcheng! Thank you!

On Wed, Dec 4, 2019 at 11:43 PM Rob Allen  wrote:

> Welcome Pengcheng!
>
> > On 5 Dec 2019, at 05:31, Dominic Kim  wrote:
> >
> > It is my pleasure to share that the OpenWhisk PMC has elected Pengcheng
> > Jiang
> > as a Committer, based on his ongoing and valuable contributions to the
> > project. Pengcheng has accepted the invitation.
> >
> > Pengcheng has been a solid Contributor to and active community member
> > within the OpenWhisk project for several years. He has contributed to
> many
> > parts of the project across many repos. He is credited with improving
> > critical path such as ensuring the result message, adding unknown
> fallback
> > kind, completing blocking activations and storage parts such as enabling
> > CouchDB cluster with erlang cookie, using the artifact store to
> manipulate
> > data, MongoDB artifact store(even if we did not pursue it for some
> reason)
> > and the on-going work, ElasticSearch activation store. He also has been
> > actively participating in the discussion via Slack or dev list and
> helping
> > newbies to soft-land to the project in many directions.
> >
> > Please join me in welcoming Pengcheng to his new role on the project.
> >
> > Best regards
> > Dominic
>
>


Re: Adding OW_ACTION_VERSION

2019-12-03 Thread Tyson Norris
Hi  -
Just to clarify, your examples mention "cache keys" - can you confirm these 
keys are stored external to the action code/container? I guess the behavior you 
are after is that a cache is populated the first time a particular version of 
the action is invoked, and the action is responsible for populating the cache? 
This will require it to be ok that the cache is overwritten by multiple 
concurrent action invocations - is that ok? Otherwise some external 
coordination will be required.

Thanks
Tyson

On 12/3/19, 1:56 AM, "Christophe Jelger"  wrote:

Hello,

I just joined the dev list after opened 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fpull%2F4761data=02%7C01%7Ctnorris%40adobe.com%7Cf6bbce97138840d6d3d808d777d713e2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637109637950585977sdata=NpAWRBdQiBdA%2BUc6YED%2BppFJnVQT31OU4YXF6Dg%2FRn0%3Dreserved=0
 : it adds OW_ACTION_VERSION to the execution environment.

I was asked by Rodric to send this email to the list because this change 
will require extra changes in the runtime repos for tests (see his comment in 
the PR).
Since I just joined the list and have until now only been an active user of 
OpenWhisk, I don’t know what the process is to do that next. Any hint is 
appreciated.

BR,
Christophe
--
Christophe Jelger | AEM Commerce Developer
Adobe Research Schweiz AG
+41 (0)61 226 5792





Re: Welcome new Committer Shawn Black

2019-12-02 Thread Tyson Norris
Welcome Shawn!!!
Best,
Tyson

On 12/2/19, 5:24 PM, "Dominic Kim"  wrote:

Congrats!
Welcome, Shawn~! :)

Regards
Dominic


2019년 12월 3일 (화) 오전 8:19, Rodric Rabbah 님이 작성:

> It is my pleasure to share that the OpenWhisk PPMC has elected Shawn Black
> as a Committer, based on his ongoing and valuable contributions to the
> project around the .NET runtime. Shawn has accepted the invitation.
>
> Please join me in welcoming Shawn to his new role on the project.
>
> -r
>




Re: Pool slot was shut down

2019-12-02 Thread Tyson Norris
Akka-http 10.1.11 is released
I created https://github.com/apache/openwhisk/pull/4759
Have not verified that this is fixed with the change, but hopefully it is.

On 11/27/19, 11:04 PM, "Markus Thömmes"  wrote:

The fix you shared seems to be only logging related so it seems that these
logs are harmless? If we don't see any detrimental effects to actual usage
I'd vote for waiting for the fix to be released to just shut down the noisy
logging. If we see actual errors though, we of course need to look at
reverting the change (and potential collapsing changes too).

Sorry for the noise.

Am Mi., 27. Nov. 2019 um 22:06 Uhr schrieb David P Grove :

> Doing a little googling, I can see an issue [1] in akka-http with a merged
> fix [2] that looks like it might be related.   Also looks like akka-http
> 10.1.11 is about a week out from release [3].
>
> Not clear to me if that means we should revert our akka-http version bump
> until 10.1.11 is out and try again then or not.  Markus?
>
> --dave
>
> [1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka-http%2Fissues%2F2728data=02%7C01%7Ctnorris%40adobe.com%7C02d8a38ab2554fa85d0e08d773d141af%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637105214903184722sdata=U6EBiAjM5ujOqUa3fxJFK4u4eUaJH94tIeLcTnFXNRw%3Dreserved=0
> [2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka-http%2Fpull%2F2816data=02%7C01%7Ctnorris%40adobe.com%7C02d8a38ab2554fa85d0e08d773d141af%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637105214903184722sdata=mjlXFaPDltt089CWrB2r5%2FCJPjVsJQTJo9I3SEGLTfk%3Dreserved=0
> [3] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka-http%2Fissues%2F2836data=02%7C01%7Ctnorris%40adobe.com%7C02d8a38ab2554fa85d0e08d773d141af%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637105214903184722sdata=ckNdSYLEGk0Z2G3e5e68RsgWa1EmXGCXvc8B9HlG8oI%3Dreserved=0
>
> "Ruediger Maass"  wrote on 11/27/2019 03:37:07
> PM:
> >
> > Since a few days I observe many entries in the logs of our builds like
> > this
> >
> > [2019-11-25T09:30:56.217Z] [ERROR] Outgoing request stream error
> >
> > akka.http.impl.engine.client.pool.NewHostConnectionPool
> > $HostConnectionPoolStage$$anon$1$Slot$$anon$2:
> > Pool slot was shut down
> >
> > When looking at green Openwhisk builds (travis) I also observe these
> > problems - e.g,.in
> > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furldata=02%7C01%7Ctnorris%40adobe.com%7C02d8a38ab2554fa85d0e08d773d141af%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637105214903184722sdata=LdjZfYlLsfbam5KAtE7iqzAQkSpTOSPpYSJFhx%2BjohM%3Dreserved=0?
> >
>
> 
u=https-3A__travis-2Dci.org_apache_openwhisk_jobs_616576483-3Futm-5Fmedium-3Dnotification-26utm-5Fsource-3Dgithub-5Fstatus=DwIFAg=jf_iaSHvJObTbx-
>
> > siA1ZOg=Fe4FicGBU_20P2yihxV-
> >
>
> 
apaNSFb6BSj6AlkptSF2gMk=mwUYQarkVTPbD6tBpHhyHglEMwOXnzknlvWyQn8Wqw4=s8p5T1wfxxv4ljEwFWrXMEONyMEY94h9GG3MmpWbgnY=
>
> >
> > Something going wrong here?
> >
> > Thanks, Ruediger
> >
> >
> >
>




Re: Tech Interchange Meeting Wednesday!

2019-11-12 Thread Tyson Norris
I think I had the times wrong: can someone confirm after the time change?
7am PST
10am EST
4pm CET
3pm GMT
11pm Beijing
12am Seoul

Call: https://zoom.us/my/asfopenwhisk

Thanks
Tyson

On 11/11/19, 5:37 PM, "Tyson Norris"  wrote:

Hi Whiskers –
I will be hosting the biweekly Tech Interchange Call on Wednesday November 
13th at 10am Eastern 7am Pacific, 3pm Central Europe, 2pm GMT, 10pm Beijing, 
11pm Seoul.
Call on Zoom:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Ctnorris%40adobe.com%7Cdcd8772079ee46523beb08d76710e938%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637091194635058084sdata=bcRFxbiTwA73Te6C4vynPVDjOyNFJw5xZihyGmjLW44%3Dreserved=0

Please submit any topics that you might like to cover.

Thanks!
Tyson




Re: ContainerPool buffering changes

2019-11-12 Thread Tyson Norris
Seeking someone to review this PR – I’ve been load testing it this week with 
good success.
https://github.com/apache/openwhisk/pull/4593

We can discuss at the call tomorrow if nobody reviews it before then.

Anyone?
Thanks
Tyson

From: Tyson Norris 
Date: Thursday, November 7, 2019 at 6:34 AM
To: "dev@openwhisk.apache.org" 
Subject: ContainerPool buffering changes

Hi –
I have a long outstanding PR to change buffer processing at ContainerPool 
https://github.com/apache/openwhisk/pull/4593
The background is that in cases where container scheduling fails, we should not 
immediately retry scheduling, but rather wait for a resource-affecting event to 
occur, and then retry. With the previous impl, we saw cases where scheduling 
would get into a tight loop and crash the invoker.

Please let me know if you have any concerns?

Thanks
Tyson


Tech Interchange Meeting Wednesday!

2019-11-11 Thread Tyson Norris
Hi Whiskers –
I will be hosting the biweekly Tech Interchange Call on Wednesday November 13th 
at 10am Eastern 7am Pacific, 3pm Central Europe, 2pm GMT, 10pm Beijing, 11pm 
Seoul.
Call on Zoom:
https://zoom.us/my/asfopenwhisk

Please submit any topics that you might like to cover.

Thanks!
Tyson


Re: OpenWhisk as a single docker image?

2019-11-09 Thread Tyson Norris
I suspect that due to Docker-in-Docker scenario, it will be easier to use 
java+jar (+local docker) instead of running the jar in a container. 

Today you can start the jar with only java , but you will need a bunch of 
parameters (probably different per OS?) to run it in a container, I think.
Local docker client is switched per OS here 
https://github.com/apache/openwhisk/blob/231e739373ef681c44b5647a6956d5838a87db2e/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/docker/StandaloneDockerContainerFactory.scala#L37
I guess this wouldn't apply if running in a container, but it arguably makes 
running the jar simpler than running the container IMHO.
I also suspect you won't get the behavior of launching playground ui to browser 
either, which I would miss. 

Tyson

On 11/9/19, 5:38 AM, "Michele Sciabarra"  wrote:

Wow. I missed those evolutions. So I guess it should not be hard to package 
it as a docker image. 

To be able to say to people: execute "docker run -p 8080:8080 
openwhisk/standalone" and enjoy...

If it is possible I can volounteer to write the dockerfile do that...

I have a question: does it use the local docker? Where is the invoker?


-- 
  Michele Sciabarra
  mich...@sciabarra.com

- Original message -
From: Rodric Rabbah 
To: dev@openwhisk.apache.org
Subject: Re: OpenWhisk as a single docker image?
Date: Saturday, November 09, 2019 2:31 PM

Do you mean the standalone controller? 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fopenwhisk%2Fblob%2Fmaster%2Fcore%2Fstandalone%2FREADME.mddata=02%7C01%7Ctnorris%40adobe.com%7Cfc313c39337a44a5882a08d7651a0cbc%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637089034867332217sdata=OCnWo8R5OfbKLaSQCEeI%2B7pqz0ewp%2BYQGBK2msMoMtc%3Dreserved=0

-r

> On Nov 9, 2019, at 8:18 AM, Michele Sciabarra  
wrote:
> 
> Hello all, 
> 
> I remember the discussion about the openwhisk as a single executable that 
includes also Kafka. So I wonder: is it now possible to run (for development 
purposes of course) OpenWhisk as single docker image if we add also couchdb to 
that one? Because I have an use case where even a docker-compose can be 
inconvenient...
> 
> -- 
>  Michele Sciabarra
>  mich...@sciabarra.com




Re: Action health checks

2019-11-07 Thread Tyson Norris
Hi - 
As discussed, I have updated the PR to reflect:
> - for prewarm, use the tcp connection for monitoring outside of activation
> workflow
> - for warm, handle it as a case of retry, where request *connection*
> failure only for /run, will be handled by way of rescheduling back to
> ContainerPool (/init should already be handled by retry for a time 
period).

Please review and provide any feedback.
https://github.com/apache/openwhisk/pull/4698

Thanks!
Tyson

On 10/30/19, 9:03 AM, "Markus Thömmes"  wrote:

Yes, I used the word "retry" here to mean "reschedule to another
container", just like you would if the healthiness probe failed.

A word of caution: TCP probes might be behaving strangely in a container
setting. They sometimes accept connections even though nothing is listening
and stuff like that.

    Am Mi., 30. Okt. 2019 um 16:34 Uhr schrieb Tyson Norris
:

> I don't think "retry" is the right handling for warm connection failures -
> if a connection cannot be made due to container crash/removal, it won't
> suddenly come back. I would instead treat it as a "reschedule", where the
> failure routes the activation back to ContainerPool, to be scheduled to a
> different container. I'm not sure how distinct we can be on detecting
> contrainer failure vs temporary network issue that may or may not resolve
> on its own, so I would treat them the same, and assume the container is
> gone.
>
> So for this PR, is there any objection to:
> - for prewarm, use the tcp connection for monitoring outside of activation
> workflow
> - for warm, handle it as a case of retry, where request *connection*
> failure only for /run, will be handled by way of rescheduling back to
> ContainerPool (/init should already be handled by retry for a time 
period).
>
> Thanks!
> Tyson
>
> On 10/30/19, 7:03 AM, "Markus Thömmes"  wrote:
>
> Increasing latency would be my biggest concern here as well. With a
> health
> ping, we can't even be sure that a container is still healthy for the
> "real
> request". To guarantee that, I'd still propose to have a look at the
> possible failure modes and implement a retry mechanism on them. If you
> get
> a "connection refused" error, I'm fairly certain that it can be 
retried
> without harm. In fact, any error where we can guarantee that we 
haven't
> actually reached the container can be safely retried in the described
> way.
>
> Pre-warmed containers indeed are somewhat of a different story. A
> health
> ping as mentioned here would for sure help there, be it just a TCP
> probe or
> even a full-fledged /health call. I'd be fine with either way in this
> case
> as it doesn't affect the critical path.
>
> Am Di., 29. Okt. 2019 um 18:00 Uhr schrieb Tyson Norris
> :
>
> > By "critical path" you mean the path during action invocation?
> > The current PR only introduces latency on that path for the case of 
a
> > Paused container changing to Running state (once per transition from
> Paused
> > -> Running).
> > In case it isn't clear, this change does not affect any retry (or
> lack of
> > retry) behavior.
> >
> > Thanks
> > Tyson
> >
> > On 10/29/19, 9:38 AM, "Rodric Rabbah"  wrote:
> >
> > as a longer term point to consider, i think the current model of
> "best
> > effort at most once" was the wrong design point - if we embraced
> > failure
> > and just retried (at least once), then failure at this level
> would
> > lead to
    > > retries which is reasonable.
> >
> > if we added a third health route or introduced a health check,
> would we
> > increase the critical path?
> >
> > -r
> >
> > On Tue, Oct 29, 2019 at 12:29 PM David P Grove <
> gro...@us.ibm.com>
> > wrote:
> >
> > > Tyson Norris  wrote on 10/28/2019
> > 11:17:50 AM:
> > > > I'm curious to know what other
> > > > folks think about "generic active probing from invoker" vs
> "docker/
> > > > mesos/k8s specific integrations for reacting to container
> > failures"?
> > > >
> > >
> > > From a pure maintenance and testing perspective I think a
> single
> > common
> > > mechanism would be best if we can do it with acceptable 
runtime
> > overhead.
> > >
> > > --dave
> > >
> >
> >
> >
>
>
>




ContainerPool buffering changes

2019-11-07 Thread Tyson Norris
Hi –
I have a long outstanding PR to change buffer processing at ContainerPool 
https://github.com/apache/openwhisk/pull/4593
The background is that in cases where container scheduling fails, we should not 
immediately retry scheduling, but rather wait for a resource-affecting event to 
occur, and then retry. With the previous impl, we saw cases where scheduling 
would get into a tight loop and crash the invoker.

Please let me know if you have any concerns?

Thanks
Tyson


Re: Action health checks

2019-10-30 Thread Tyson Norris
I don't think "retry" is the right handling for warm connection failures - if a 
connection cannot be made due to container crash/removal, it won't suddenly 
come back. I would instead treat it as a "reschedule", where the failure routes 
the activation back to ContainerPool, to be scheduled to a different container. 
I'm not sure how distinct we can be on detecting contrainer failure vs 
temporary network issue that may or may not resolve on its own, so I would 
treat them the same, and assume the container is gone.

So for this PR, is there any objection to:
- for prewarm, use the tcp connection for monitoring outside of activation 
workflow
- for warm, handle it as a case of retry, where request *connection* failure 
only for /run, will be handled by way of rescheduling back to ContainerPool 
(/init should already be handled by retry for a time period).

Thanks!
Tyson

On 10/30/19, 7:03 AM, "Markus Thömmes"  wrote:

Increasing latency would be my biggest concern here as well. With a health
ping, we can't even be sure that a container is still healthy for the "real
request". To guarantee that, I'd still propose to have a look at the
possible failure modes and implement a retry mechanism on them. If you get
a "connection refused" error, I'm fairly certain that it can be retried
without harm. In fact, any error where we can guarantee that we haven't
actually reached the container can be safely retried in the described way.

Pre-warmed containers indeed are somewhat of a different story. A health
ping as mentioned here would for sure help there, be it just a TCP probe or
even a full-fledged /health call. I'd be fine with either way in this case
as it doesn't affect the critical path.
    
    Am Di., 29. Okt. 2019 um 18:00 Uhr schrieb Tyson Norris
:

> By "critical path" you mean the path during action invocation?
> The current PR only introduces latency on that path for the case of a
> Paused container changing to Running state (once per transition from 
Paused
> -> Running).
> In case it isn't clear, this change does not affect any retry (or lack of
> retry) behavior.
>
> Thanks
> Tyson
>
> On 10/29/19, 9:38 AM, "Rodric Rabbah"  wrote:
>
> as a longer term point to consider, i think the current model of "best
> effort at most once" was the wrong design point - if we embraced
> failure
> and just retried (at least once), then failure at this level would
> lead to
> retries which is reasonable.
>
> if we added a third health route or introduced a health check, would 
we
> increase the critical path?
>
> -r
>
> On Tue, Oct 29, 2019 at 12:29 PM David P Grove 
> wrote:
>
> > Tyson Norris  wrote on 10/28/2019
> 11:17:50 AM:
> > > I'm curious to know what other
> > > folks think about "generic active probing from invoker" vs 
"docker/
> > > mesos/k8s specific integrations for reacting to container
> failures"?
> > >
> >
> > From a pure maintenance and testing perspective I think a single
> common
> > mechanism would be best if we can do it with acceptable runtime
> overhead.
> >
> > --dave
> >
>
>
>




Re: Action health checks

2019-10-29 Thread Tyson Norris
By "critical path" you mean the path during action invocation? 
The current PR only introduces latency on that path for the case of a Paused 
container changing to Running state (once per transition from Paused -> 
Running). 
In case it isn't clear, this change does not affect any retry (or lack of 
retry) behavior.

Thanks
Tyson

On 10/29/19, 9:38 AM, "Rodric Rabbah"  wrote:

as a longer term point to consider, i think the current model of "best
effort at most once" was the wrong design point - if we embraced failure
and just retried (at least once), then failure at this level would lead to
retries which is reasonable.

if we added a third health route or introduced a health check, would we
increase the critical path?

-r

On Tue, Oct 29, 2019 at 12:29 PM David P Grove  wrote:

> Tyson Norris  wrote on 10/28/2019 11:17:50 AM:
> > I'm curious to know what other
> > folks think about "generic active probing from invoker" vs "docker/
> > mesos/k8s specific integrations for reacting to container failures"?
> >
>
> From a pure maintenance and testing perspective I think a single common
> mechanism would be best if we can do it with acceptable runtime overhead.
>
> --dave
>




Re: Action health checks

2019-10-28 Thread Tyson Norris
Hi Markus - 
The failures are generic and we haven't seen a real cause as of yet, on mesos 
we get an error of "Container exited with status 125". We continue to 
investigate that of course, but containers may die for any number of reasons so 
we should just plan on them dying. We do get an event from mesos already on 
these failures, and I'm sure we can integrate with Kubernetes to react as well, 
but I thought it might be better to make this probing simpler and consistent 
e.g. where DockerContainerFactory can be treated the same way. If nothing else, 
it is certainly easier to test. I'm curious to know what other folks think 
about "generic active probing from invoker" vs "docker/mesos/k8s specific 
integrations for reacting to container failures"?

RE HTTP requests - For prewarm, we cannot add this check there, since e.g. if 
20 prewarms fail for this invoker, a single activation might try each of those 
twenty before getting a working container, which seems like bad behavior 
compared to preemptively validating the container and replacing it outside the 
HTTP workflow for prewarms. For warm containers, it would be more feasible to 
do this but we would need to distinguish "/run after resume" from "/run before 
pause", and provide a special error case for connection failure after resume 
since we cannot treat all warm container failures as retriable - only once 
after resume.  This seemed more complicated than explicitly checking it once 
after resume inside ContainerProxy.  One possible change would be to move the 
checking logic inside either Container or ContainerClient, but I would keep it 
separate from /init and /run, and consider revisiting it if we change the HTTP 
protocol to include some more sophisticated checking via HTTP ( add a /health 
endpoint etc). 

Thanks
Tyson


On 10/28/19, 2:21 AM, "Markus Thömmes"  wrote:

Heya,

thanks for the elaborate proposal.

Do you have any more information on why these containers are dying off in
the first place? In the case of Kubernetes/Mesos I could imagine we might
want to keep the Invoker's state consistent by checking it against the
respective API repeatedly. On Kubernetes for instance, you could setup an
informer that'd inform you about any state changes on the pods that this
Invoker has spawned. If a prewarm container dies this way, we can simply
remove it from the Invoker's bookkeeping and trigger a backfill.

Secondly, could we potentially fold this check into the HTTP requests
themselves? If we get a "connection refused" on an action that we knew
worked before, we can safely retry. There should be a set of exceptions
that our HTTP clients should surface that should be safe for us to retry in
the invoker anyway. The only addition you'd need in this case is an
enhancement on the ContainerProxy's state machine I believe, that allows
for such a retrying use-case. The "connection refused" use-case I mentioned
should be equivalent to the TCP probe you're doing now.

WDYT?

    Cheers,
Markus

Am So., 27. Okt. 2019 um 02:56 Uhr schrieb Tyson Norris
:

> Hi Whiskers –
> We periodically have an unfortunate problem where a docker container (or
> worse, many of them) dies off unexpectedly, outside of HTTP usage from
> invoker. In these cases, prewarm or warm containers may still have
> references at the Invoker, and eventually if an activation arrives that
> matches those container references, the HTTP workflow starts and fails
> immediately since the node is not listening anymore, resulting in failed
> activations. Or, any even worse situation, can be when a container failed
> earlier, and a new container, initialized with a different action is
> initialized on the same host and port (more likely a problem for k8s/mesos
> cluster usage).
>
> To mitigate these issues, I put together a health check process [1] from
> invoker to action containers, where we can test
>
>   *   prewarm containers periodically to verify they are still
> operational, and
>   *   warm containers immediately after resuming them (before HTTP
> requests are sent)
> In case of prewarm failure, we should backfill the prewarms to the
> specified config count.
> In case of warm failure, the activation is rescheduled to ContainerPool,
> which typically would either route to a different prewarm, or start a new
> cold container.
>
> The test ping is in the form of tcp connection only, since we otherwise
> need to update the HTTP protocol implemented by all runtimes. This test is
> good enough for the worst case of “container has gone missing”, but cannot
> test for more subtle problems like “/run 

Action health checks

2019-10-26 Thread Tyson Norris
Hi Whiskers –
We periodically have an unfortunate problem where a docker container (or worse, 
many of them) dies off unexpectedly, outside of HTTP usage from invoker. In 
these cases, prewarm or warm containers may still have references at the 
Invoker, and eventually if an activation arrives that matches those container 
references, the HTTP workflow starts and fails immediately since the node is 
not listening anymore, resulting in failed activations. Or, any even worse 
situation, can be when a container failed earlier, and a new container, 
initialized with a different action is initialized on the same host and port 
(more likely a problem for k8s/mesos cluster usage).

To mitigate these issues, I put together a health check process [1] from 
invoker to action containers, where we can test

  *   prewarm containers periodically to verify they are still operational, and
  *   warm containers immediately after resuming them (before HTTP requests are 
sent)
In case of prewarm failure, we should backfill the prewarms to the specified 
config count.
In case of warm failure, the activation is rescheduled to ContainerPool, which 
typically would either route to a different prewarm, or start a new cold 
container.

The test ping is in the form of tcp connection only, since we otherwise need to 
update the HTTP protocol implemented by all runtimes. This test is good enough 
for the worst case of “container has gone missing”, but cannot test for more 
subtle problems like “/run endpoint is broken”. There could be other checks to 
increase the quality of test we add in the future, but most of this I think 
requires expanding the HTTP protocol and state managed at the container, and I 
wanted to get something working for basic functionality to start with.

Let me know if you have opinions about this, and we can discuss  here or in the 
PR.
Thanks
Tyson

[1] https://github.com/apache/openwhisk/pull/4698


Re: Dangers of renaming and removing runtime kinds

2019-09-17 Thread Tyson Norris
I think the PR is a good start, but I also think upgrading OpenWhisk is a 
hazardous area that has broader scope that operators need to consider when 
running OpenWhisk with any custom configuration (and nobody should probably run 
with default config, but I guess technically it would be possible). 
Considerations include:
* which custom invoker/controller akka configs may conflict with new config 
defaults? 
* runtime manifest changes or runtime changes (if you use public runtimes)
* kafka message schema changes
* docker image command/args changes that may surface if you have a custom 
deployment to run public images
* db changes (I guess less likely an issue since most operators would not run 
"custom schemas" on the db?)

I'm not sure how to produce an exhaustive list, but it may be something we 
should add to an "upgrading openwhisk" section of an "operators guide" - I 
don't know if such a guide currently exists.

Thanks
Tyson

On 9/16/19, 11:43 PM, "Sven Lange-Last"  wrote:

Hello Dave,

I absolutely agree that all adopters running Apache Openwhisk as a private 
or public production offering will or even should have their own runtimes 
manifest - like we do in IBM.

At the same time, we are using the Apache Openwhisk test suite to run 
against our IBM version of the system. When action kinds change in this 
test suite ("java" to "java:8"), this requires some work on our side. I 
admit that's our problem.

With my proposal to improve documentation, I wanted to make adopters aware 
of what runtime changes mean. Even if adopters have their own version of 
the runtimes manifest, I guess they start with a copy of the Apache 
Openwhisk default manifest. So when they set up their runtime manifest, 
they hopefully keep the new description to make maintainers of the file 
aware that removal of runtime kinds needs to be planned carefully.



Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk






Re: Backpressure for slow activation storage in Invoker

2019-09-16 Thread Tyson Norris


On 9/16/19, 8:32 AM, "Chetan Mehrotra"  wrote:

Hi Tyson,

> in case of logs NOT in db: when queue full, publish non-blocking to 
"completed-non-blocking"

The approach I was thinking was to completely disable (configurable)
support for persisting activation from Invoker and instead handle all
such work via activation persister service.

That sounds find. I thought there was a suggestion to try to optimize the 
storage path by only diverting to kafka in case the memory queue is full. I 
agree it is simpler to treat everything the same.

Thanks
Tyson   





Re: Please submit topics for this week's (Wed. 18th) Tech. Interchange call!

2019-09-16 Thread Tyson Norris
Hi Matt - 
Please add: Dan McWeeney - present some prototype code related to execution 
design discussion. 

Thanks!
Tyson 

On 9/16/19, 6:03 AM, "Matt Rutkowski"  wrote:

Hello Whiskers!

Please submit items for agenda for this Wednesday’s (Sept 18) Tech 
Interchange call.

Some topics I already have "penciled in" include:

  * Proposal for new Tech. Int. Meeting time(s) - Dom
  * JVM Pre-cache optimization work in Java runtime - Matt
  * OpenWhisk Tekton Pipeline update - Priti

Looking forward!
Matt

Day-Time: Wednesday Sept 18, 11AM EDT (Eastern US), 5PM CEST (Central 
Europe), 3PM GMT, 11PM (Beijing)
Zoom: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Ctnorris%40adobe.com%7C0581fbf465ee4042932508d73aa63481%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637042357832458664sdata=4TcS53JT312WbS%2FXGz7jMPv8yWz%2Be25zwsQq5Mb9qz0%3Dreserved=0




OpenWhisk Execution Design

2019-09-16 Thread Tyson Norris
Hi –
Here is a more detailed document regarding execution design that I briefly 
discussed at last meeting.
https://docs.google.com/document/d/1A8IyQ2Zjjl6WPc41DBWJa28bp7jEs46bvXVO_H77yBY/edit?usp=sharing

Please review and comment. Dan McWeeney will provide a brief demo  of some 
prototype code at this week’s meeting.

Related: to provide a PR to core repo that includes experimental code, Dan 
submitted a PR to exclude a directory from code scanning.
https://github.com/apache/openwhisk-utilities/pull/71

Thanks
Tyson



Re: Backpressure for slow activation storage in Invoker

2019-09-12 Thread Tyson Norris
I think this sounds good, but want to be clear I understand the consumers and 
producers involved - is this summary correct?

Controller: 
* consumes "completed-" topic (as usual)
Invoker:
* in case of logs NOT in db: when queue full, publish non-blocking to 
"completed-non-blocking"
*in case of logs in db: when queue full, publish all to "Activations" topic
OverflownActivationRecorderService (new service): 
* in case of logs NOT in db: consumes "completed-*" topic(s) AND 
"completed-non-blocking" topic
* in case of logs in db: consumes "Activations" topic

Thanks!
Tyson

On 9/11/19, 4:51 AM, "Chetan Mehrotra"  wrote:

As part of implementing this feature I came across support for topic
patterns in Kafka [1] [2]. It seems to allow listening to multiple
topics by same or a group of consumer. So after discussing with Sven
(thanks Sven!) I came up with following proposal

With this I think we can go back to "Option B1 - Activations via
controller topic" and thus subscribe to "completed-.*" pattern.

This would help by avoiding any extra load on Kafka as we consumer
same activation result messages as being sent to Controller. However
there are few caveats

1. Currently we send activation result via Kafka only for blocking calls
2. Result send does not contain logs

So we can possibly have support for 2 modes

Option CB1 - Existing topic + new topic for non blocking result
---

This mode would be used if the setup does not record the logs in db.
In this mode we would add support in Invoker to also send result for
non blocking calls to a new "completed-non-blocking" topic and then
listen for "completed-.*"

Option CB2 - New topic + KafkaActivationStore
--
This mode can be used if setup stores logs in db. Here we would have a
new KafkaActivationStore which would send the activations to a new
"activations" topic

The ActivationPersister service can support both modes and cluster
operator can configure it in required mode

Chetan Mehrotra
[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.akka.io%2Fdocs%2Falpakka-kafka%2Fcurrent%2Fsubscription.html%23topic-patterndata=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272sdata=pKognLhE6vFlE4k6ztn0%2BnYmnyVBi%2FFkD1NhN6PkkeI%3Dreserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F11%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23subscribe-java.util.regex.Pattern-org.apache.kafka.clients.consumer.ConsumerRebalanceListener-data=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272sdata=SJIKaxcjtscX9FUjkUWdVTFN3Y3mmJfwNQUCJOKnqNg%3Dreserved=0

On Mon, Jun 24, 2019 at 11:57 PM Chetan Mehrotra
 wrote:
>
> > For B1, we can scale out the service as controllers are scaled out, but 
it
> > would be much complex to manually assign topics.
>
> Yes thats what my concern was in B1. So would for now target B2
> approach where we have a dedicated new topic and then have it consumed
> by a new service.  If it poses problem down the line then we can go
> for B1. B
>
> Chetan Mehrotra
>
> On Tue, Jun 25, 2019 at 10:08 AM Dominic Kim  wrote:
> >
> > Let me share a few ideas on them.
> >
> > Regarding option B1, I think it can scale out better than option B2.
> > If I understood correctly, scaling out of the service will be highly
> > dependent on Kafka.
> > Since the number of consumers is limited to the number of partitions, 
the
> > number of service nodes will be also limited to the number of 
partitions.
> >
> > So in the case of B2, if we create a new topic with some partition 
numbers,
> > we cannot scale out the service nodes more than that.
> > At some point, we may need to alter the number of partitions and it's 
not
> > easy in Kafka.
> > (Since the activation processing here is asynchronous, we may bear some
> > downtime(1~2s) to alter the partition. Then it would be fine.)
> >
> > In the case of B1, there will be many controller topics with their own
> > partitions.
> > Since controllers can be scaled out, there will be more topics, and the
> > activation service can scale out accordingly.
> > But in this case, we need to manually control the topic assignment.
> > (Not partition assignment, it will be done by Kafka.)
> >
> > Let's say we have 3 controller topics with 2 partitions each.
> > For HA, it would be great to have at least two nodes.
> > At first, both nodes will take care of all three topics.
> > Based on the partition assignment plan in Kafka, both nodes will fetch
 

Re: Allow decision about action result inclusion in logs on a per call basis

2019-09-10 Thread Tyson Norris
Thanks for the info. It makes sense. I think it is slightly confusing that this 
field is not invoked anywhere in the code, because you only invoke from another 
class managed externally. 
Thanks
Tyson

On 9/10/19, 2:45 AM, "Ruediger Maass"  wrote:

Hi Tyson, please find my answers below.

A couple questions:
* " Action results may contain sensible data that should not be logged." - 
I don't think this solves that problem, correct? E.g. the result or error 
could have sensitive data still, right? Since this is generated based on 
the user code and that code's error messages which may have just included 
the result details you were trying to hide.

> The error result in my example is created by the managed runtime, not by 
the user action code (that has not been started at this point). The error 
message in this case will never contain any user messages. On the other 
hand, user error messages that are driven by the action code are like 
normal log statements: the customer decides what is logged and which 
content error messages contain. This discussion is a good example why we 
do not propose any hard-coded strategy but a flexible flag that can be set 
as needed by product-specific SPI implementations. We know there are 
different opinions. We will set the flag with our product-specific code as 
we need. The generic character of ActivationFileStorage is still preserved
. The flag makes the class even more reusable, we try to avoid to 
introduce just another specialized copy of that class.

* Does it make sense to have the data be different in Activation json vs 
user_log?
One approach I've been wondering about is applying the same log to the 
Activation json, e.g.: do not store the result *anywhere* for success 
results. Obviously this cannot work for async invocations, but we have 
many cases where this data is never used, and just bloats the database. 
Some factors that may affect ability to NOT store the result include: sync 
vs async, "debugging" vs "prod", error vs success, large vs small. Your 
case signals to me that this is useful, at least to have as a config flag, 
but I would apply it at the Activation level, and not just downstream in 
the user_log.

> We also see that activations consume database space. But currently we 
want/need the current behavior of OpenWhisk as it is in order to provide a 
robust product implementation (especially also for action sequences). With 
or without the proposed flag it would still be easy to add a feature flag 
(or SPI implementation) like you discussed for changing the application 
json in the database if that would fit better for your product 
implementation. As said, the default behavior of OpenWhisk is not changed 
by our change in any way. Please also note how tiny the code change is 
(github.com/apache/openwhisk/pull/4604) - we do not add more complexity to 
the code with this little change.

PR is github.com/apache/openwhisk/pull/4604

    Thanks, Ruediger



From:   Tyson Norris 
To: "dev@openwhisk.apache.org" 
Date:   06/09/2019 20:56
Subject:[EXTERNAL] Re: Allow decision about action result 
inclusion in logs on a per call basis



A couple questions:
* " Action results may contain sensible data that should not be logged." - 
I don't think this solves that problem, correct? E.g. the result or error 
could have sensitive data still, right? Since this is generated based on 
the user code and that code's error messages which may have just included 
the result details you were trying to hide.
* Does it make sense to have the data be different in Activation json vs 
user_log? 

One approach I've been wondering about is applying the same log to the 
Activation json, e.g.: do not store the result *anywhere* for success 
results. Obviously this cannot work for async invocations, but we have 
many cases where this data is never used, and just bloats the database. 
Some factors that may affect ability to NOT store the result include: sync 
vs async, "debugging" vs "prod", error vs success, large vs small. Your 
case signals to me that this is useful, at least to have as a config flag, 
but I would apply it at the Activation level, and not just downstream in 
the user_log.

Thanks
Tyson




On 9/6/19, 10:21 AM, "Ruediger Maass"  wrote:

corrected the subject of this email, sorry about this.
 
 
 
From:   Ruediger Maass/Germany/IBM
To: dev@openwhisk.apache.org
Date:   06/09/2019 19:18
Subject:Re: [EXTERNAL] Re: Allow decision about action result 
  

Re: Allow decision about action result inclusion in logs on a per call basis

2019-09-06 Thread Tyson Norris
A couple questions:
* " Action results may contain sensible data that should not be logged." - I 
don't think this solves that problem, correct? E.g. the result or error could 
have sensitive data still, right? Since this is generated based on the user 
code and that code's error messages which may have just included the result 
details you were trying to hide.
* Does it make sense to have the data be different in Activation json vs 
user_log? 

One approach I've been wondering about is applying the same log to the 
Activation json, e.g.: do not store the result *anywhere* for success results. 
Obviously this cannot work for async invocations, but we have many cases where 
this data is never used, and just bloats the database. Some factors that may 
affect ability to NOT store the result include: sync vs async, "debugging" vs 
"prod", error vs success, large vs small. Your case signals to me that this is 
useful, at least to have as a config flag, but I would apply it at the 
Activation level, and not just downstream in the user_log.

Thanks
Tyson




On 9/6/19, 10:21 AM, "Ruediger Maass"  wrote:

corrected the subject of this email, sorry about this.



From:   Ruediger Maass/Germany/IBM
To: dev@openwhisk.apache.org
Date:   06/09/2019 19:18
Subject:Re: [EXTERNAL] Re: Allow decision about action result 
inclusion in logs on a per call basis



Ok, I see the problem. I'll ty to make it more clear:

- with "action log" I mean the output of action invocations that is logged 
for the users (stored in the userlogs-.log files of the invokers and 
controllers (for action sequences)

- with "activation json" I mean the json document for an activation like 
this:
{
"activationId": "5a2b19119c574ce7ab19119c572ce730",
"annotations": [
{
"key": "path",
"value": "ruediger.ma...@de.ibm.com_MySpaceUS1/compileError"
}, 
   ...etc 
],
"duration": 451,
"end": 1567721624876,
"logs": [],   // no log lines in this case
"name": "compileError",
"namespace": "ruediger.ma...@de.ibm.com_MySpaceUS1",
"publish": false,
"response": {
"result": {
"error": "Initialization has failed due to: SyntaxError: 
Unexpected identifier\nat NodeActionRunner.init 
(/nodejsAction/runner.js:79:109)\nat doInit 
(/nodejsAction/src/service.js:142:31)\nat initCode 
(/nodejsAction/src/service.js:81:24)\nat /nodejsAction/app.js:69:13\n  
 at Layer.handle [as handle_request] 
(/node_modules/express/lib/router/layer.js:95:5)\nat next 
(/node_modules/express/lib/router/route.js:131:13)\nat Route.dispatch 
(/node_modules/express/lib/router/route.js:112:3)\nat Layer.handle [as 
handle_request] (/node_modules/express/lib/router/layer.js:95:5)\nat 
/node_modules/express/lib/router/index.js:277:22\nat 
Function.process_params 
(/node_modules/express/lib/router/index.js:330:12)"
},
"status": "action developer error",
"success": false
},
"start": 1567721624425,
"subject": "ruediger.ma...@de.ibm.com",
"version": "0.0.1"
}

- We do not want to change the activation json. We want to change the 
output to the action log that is produced from the activation json.

- one activation json produces one line in the action log of type 
'activation_record' and n lines of type 'user_log', one user_log line per 
output line that is printed by the action. The user_log lines are stored 
in the 'logs' field of the activation json. This would be the output in 
the action log for the above mentioned invocation:

1 line of type action_record and 0 lines of type user_log, I formatted the 
one line for better readability:

{"activationId":"68484f5c911d412b884f5c911df12b45",
"duration":498,
"end":1567785044285,
   ...etc. ...
   // here's the result from above we do not want to be suppressed:
"message":"{\"error\":\"Initialization has failed due to: SyntaxError: 
Unexpected identifier\\n 
  at NodeActionRunner.init (/nodejsAction/runner.js:79:109)\\n 
  at doInit (/nodejsAction/src/service.js:142:31)\\n
  at initCode (/nodejsAction/src/service.js:81:24)\\n
  at /nodejsAction/app.js:69:13\\n
  at Layer.handle [as handle_request] 
(/node_modules/express/lib/router/layer.js:95:5)\\n
  at next (/node_modules/express/lib/router/route.js:131:13)\\n
  at Route.dispatch (/node_modules/express/lib/router/route.js:112:3)\\n
  at Layer.handle [as handle_request] 
(/node_modules/express/lib/router/layer.js:95:5)\\n
  at /node_modules/express/lib/router/index.js:277:22\\n
  at Function.process_params 

Re: Can we adjust the time for the community call

2019-09-05 Thread Tyson Norris
US West coast dweller here - I'm OK with either 1 hour ealier, or alternating 
times. Anything to get more players on the field :)

Thanks
Tyson

On 9/5/19, 3:53 AM, "Rodric Rabbah"  wrote:

Thanks Dominic for bringing this up. +1 from me.
What about alternating times (every two weeks) so other time zones aren't
always so late in the day?

-r

On Wed, Sep 4, 2019 at 11:27 PM Dominic Kim  wrote:

> Dear community members.
>
> I wonder whether we can set up our Tech. Int. meeting 1-hour earlier.
> Currently, we hold the meeting at 3 PM GMT which is 12:00 AM KST/JST.
>
> If we hold the meeting 1-hour earlier, the time changes like this:
>
> - 2 PM GMT.
> - 10 AM US eastern time
> - 10 CST
> - 11 KST/JST
>
> One issue is the US western time would be 7 AM.
> So if anyone lives in that area, I would give up.
>
> But if it's fine for most of the members, I hope we start the meeting
> 1-hour earlier.
> It would help me and other folks living in Korea/Japan to join the meeting
> better.
>
>
> Best regards
> Dominic
>




Tech Interchange meeting tomorrow

2019-09-03 Thread Tyson Norris
Hi All –
Please submit items for agenda for tomorrow’s (Sept 4) Tech Interchange call.

I would plan to discuss

  *   A forthcoming proposal for action invocation workflows.
  *   Adding a sandbox area to repo for experimentation (directory in main repo 
vs separate repo).

See you then!
Tyson

Day-Time: Wednesday Sept 4, 11AM EDT (Eastern US), 5PM CEST (Central Europe), 
3PM GMT, 11PM (Beijing)
Zoom: https://zoom.us/my/asfopenwhisk




Re: Passing TransactionId as part of action invocation

2019-08-21 Thread Tyson Norris
I think the point of transaction id in this case is to correlate multiple 
activations, similar to how a sequence works, but not relying on sequence as 
the mechanism for doing this. 

Today, if you launch many activations explicitly, e.g. using OW SDK in your 
nodejs action, they are not "related" to each other, and this would offer a way 
to work around that. Initially, just storing the transaction id, means that 
operators can create queries to stitch multiple activations that originated 
from the same request. It would also be possible to expose transaction id to 
users in the same way that activation id is using a first class API, maybe as 
part of the existing activation API, e.g. GET 
o/api/v1/namespaces/_/activations?tid=

Users can certainly use APIs wrong, but with decent documentation, I don't 
think this should dissuade us from providing the feature.

Thanks
Tyson

On 8/21/19, 8:16 AM, "Martin Henke"  wrote:

Chetan,

from an operational point of view I have some fear that we will confuse the 
user by making the transaction id visible as a second id besides the 
activation id. 
Some will certainly use it to fetch activation records and fail, which will 
lead to questions.
Any thoughts from your side ?

Regards,
Martin


> On 20. Aug 2019, at 12:32, Chetan Mehrotra  
wrote:
> 
> I created a separate thread to discuss how to store such metadata related
> to activation.
> 
> Current open PR #4586 only enables exposing the transactionId to env. It
> does not make any attempt to store the transactionId currently. Once we
> decide how such data should be stored then I can open PR for  the same
> 
> Chetan Mehrotra
> 
> 
> On Mon, Aug 19, 2019 at 8:47 AM Rodric Rabbah  wrote:
> 
>> Yes indeed. Your pr already open I think is fine as is.
>> 
>> -r
>> 
>> On Aug 19, 2019, at 11:36 AM, Chetan Mehrotra 
>> wrote:
>> 
 That’s true. Time for api/v2...
>>> 
>>> This is now becoming a rabbit hole! What option should we use without
>> going
>>> for v2?
>>> 
>>> 1. Introduce a new "meta" sub document
>>> 2. OR Change annotations to flat map while storing but transform that to
>>> array based structure while returning to client
>>> 
>>> Chetan Mehrotra
>>> 
>>> 
 On Mon, Aug 19, 2019 at 7:15 AM Rodric Rabbah  wrote:
 
 
> However changing them now would cause compatibility
> issue with various tooling out there which may be interpreting the
> annotation per current design
 
 That’s true. Time for api/v2... 
>> 





Re: Recording metadata related to activation

2019-08-21 Thread Tyson Norris
This part (exposing transaction id to action code) is provided via 
https://github.com/apache/openwhisk/pull/4586

I'm not sure what other meta may exist or planned that does not already follow 
this pattern, but I agree it should all be included where possible - cannot 
include the "duration", since that is only available after execution, but 
action config, like limits, may be useful to include here as well? 

For now, the data fields from ActivationMessage and ExecutableWhiskAction are 
explicitly extracted and provided to the runtime in an "environment" map - we 
could certainly change this to be more generic, like inferring map keys from 
all fields, or just sending json, but this is a bigger change to coordinate 
with runtimes, and gets into the question of whether /init and /run should have 
different signatures, I think.

I think a first step is to create separate meta dictionary on Activation 
(option 1) without changing the API (use annotations) or runtimes. We can 
iterate on invoker/runtime coordination to make passing this data more 
consistent, and change /init /run orchestration separately as needed. 

Thanks
Tyson

 
On 8/21/19, 3:05 AM, "Erez Hadad"  wrote:

On the same note, why not also expose this "meta" information to the 
action code *at runtime*? 
The current direction this discussion is going seems to be having the 
"meta" information only after the action completes, in an activation 
record (under new key or as annotations).

However, think of the following use-case: the "transaction id" can be 
useful for having multiple actions performing computation as part of a 
single transaction, and updating a DB. In such a case, the action code 
needs to know the transaction id so it can be passed to the DB service, 
marking the resulting update as part of the broader transaction. 
Similar cases can be made for other fields. 

Bottom line: I think this "meta" information needs to be more streamlined 
end-to-end, available to code during invocation and persisted post-factum 
in the activation record.

Regards,
-- Erez




From:   Dominic Kim 
To: dev@openwhisk.apache.org
Date:   21/08/2019 02:58
Subject:[EXTERNAL] Re: Recording metadata related to activation



That would be useful from the operator point of view.
One question is "would that information be exposed to users"?

I think the information which is exposed to users should be
platform-independent.
No matter which underlying platform/implementation is being used, users do
and should not need to know about the internal.
So that even if the operator changes their internals(K8s, native, cluster
federation, ...) there should be no difference in user experience.

One option can be storing them as parts of an activation for operators but
exclude them when returning them in response to the user request.
Though I am not sure whether this can be aligned with what you keep in 
your
mind.


Regarding the two structure options, I am inclined to use the existing
structure "annotations" as it does not introduce any schema change.
However, I also found it cumbersome to manipulate them in many cases.
I feel it would be great to change annotations to a dictionary at some
point.

Since I am not aware of the history, I am curious whether there is any
specific reason that annotations should be the current form.

Best regards
Dominic

2019년 8월 21일 (수) 오전 12:38, Matt Sicker 님이 작성:

> I mean, unless you're using these correlation ids in your business
> logic, I don't see the problem of storing them in the database. My own
> thoughts on using this feature would all be diagnostics-related. I'm
> not running any non-trivial functions, though.
>
> On Tue, 20 Aug 2019 at 05:30, Chetan Mehrotra 

> wrote:
> >
> > Hi Team,
> >
> > Branching the thread [1] to discuss how to record some metadata
> > related to activation. Based on some of the usecases I see a need to
> > record some more metadata related to activation. Some examples are
> >
> > 1. transactionId - Record the transactionId for which the activation 
is
> part of
> > 2. pod name - Records the pod running the action container when using
> > KubernetesContainerFactory
> > 3. invocationId - Some id returned by underlying system when
> > integrating with AWS Lambda or Azure Function
> > 4. clusterId - If running multiple clusters for same system we would
> > like to know which cluster handed the given execution
> >
> > Some of these ids are determined as part of `ContainerResponse` itself
> > and have to be made part of activation json such that later we can
> > correlate the activation with other parts.
> >
> > Now we need to 

Re: Passing TransactionId as part of action invocation

2019-08-16 Thread Tyson Norris
I think if OW SDK, and sequences/compositions, propagate X-Request-Id
header (using the existing transaction id/X-Request-Id), the parent is not
needed? i.e. there may be 2 parts to this effort:
- expose the transaction id to runtime container
- propagate the transaction id in requests initiated from runtime
container/controller



On Thu, Aug 15, 2019 at 10:38 AM Rodric Rabbah  wrote:

> In general yes but I think generally do you need the transaction id or the
> parent id for an activation?
>
> This issue is relevant - https://github.com/apache/openwhisk/issues/3083.
> I also recall in the early days of the composer, we wanted a way to query
> parent/child activations but this requires new couch views and we didn't
> pursue it.
>
>
>
> On Thu, Aug 15, 2019 at 1:20 PM Chetan Mehrotra  >
> wrote:
>
> > Currently we pass the `activation_id` as part of `/run` call to any
> > action runtime [1]. Would it be fine to also pass the `TransactionId`
> > such that it can be accessed by action code?
> >
> > One usecase of this would be to enable tracing a sequence/composition
> > by linking all activations which are part of same transaction in
> > epsagon [2]
> >
> > Chetan Mehrotra
> > [1]
> >
> https://github.com/apache/openwhisk/blob/master/docs/actions-new.md#activation
> > [2]
> >
> https://epsagon.com/blog/epsagon-makes-troubleshooting-apache-openwhisk-a-snap/
> >
>


Re: Openwhisk prewar containers

2019-07-24 Thread Tyson Norris
Hi Adam - 
Currently there is no support for prewarming blackbox containers. If you need 
prewarm support, you can add them to the runtimes manifest configured in your 
deployment, define prewarm configs for each there, and reference them from 
actions using "--kind ".

Best
Tyson

On 7/24/19, 11:10 AM, "Adam Versano"  wrote:

Hi,

I’m running Openwhisk on rancher for my production env.
I'm using kubernetes factory
My actions are written in go and compiled, so they are of kind blackbox.

Is there any way I can prewarm this actions?

Thanks


[SigmaDots logo]

[Youtube 
link]
 [Linkedin link] 


Adam Versano
Software developer

T:   M:+972(50)8995-119
E: ada...@sigmadots.com | 
https://nam04.safelinks.protection.outlook.com/?url=www.SigmaDots.comdata=02%7C01%7Ctnorris%40adobe.com%7C3f6af330b59c421b3c1b08d71062344e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636995886286496366sdata=nu5RYqFa1md%2FtWo7Pq5Po1iAsu81i1wzA8XJIbyfbDE%3Dreserved=0

Part of Essence-grp


·

This email and its attached content (if any) are private and confidential.
Click here to read the full 
disclaimer.
Please consider the environment before printing this email.


·




Re: OpenWhisk invoker overloads - "Rescheduling Run message"

2019-07-19 Thread Tyson Norris
   The PR you referenced above contains a lot of 
other changes. It does not only improve this particular area but also 
includes a lot of other changes - in particular, it adds a different way 
of managing containers. Due to the PR's size and complexity, it's very 
hard to understand and review... Would you be able to split this PR up 
into smaller changes?

Indeed, I will split the PR ASAP. There are several changes there that are 
candidates for this, I think, so will try to create PR for each.
Thanks
Tyson




Re: Re: OpenWhisk invoker overloads - "Rescheduling Run message"

2019-07-08 Thread Tyson Norris
Related to the "Rescheduling Run message", one problem we have encountered in 
these cases is that the invoker becomes unstable due ( I think) to a tight 
message loop, since the message that couldn't run is immediately resent to the 
pool to be run, which fails again, etc. We saw CPU getting pegged, and invoker 
eventually would crash.
I have a PR related to cluster managed resources where, among other things, 
this message looping is removed:
https://github.com/apache/incubator-openwhisk/pull/4326/files#diff-726b36b3ab8c7cff0b93dead84311839L198

Instead of resending the message to the pool immediately, it just waits in the 
runbuffer, and the runbuffer is processed in reaction to any potential change 
in resources: NeedWork, ContainerRemoved, etc. This may add delay to any 
buffered message(s), but seems to avoid the catastrophic crash in our systems. 

Thanks
Tyson

On 7/5/19, 1:16 AM, "Sven Lange-Last"  wrote:

Hello Dominic,

thanks for your detailed response.

I guess your understanding is right - just this small correction:

> So the main issue here is there are too many "Rescheduling Run" messages 
in invokers?

It's not the main issue to see these log entries in the invoker. This is 
just the indication that something is going wrong in the invoker - more 
activations are waiting to be processed than the ContainerPool can 
currently serve.

Actually, there are different reasons why "Rescheduling Run message" log 
entries can show up in the invoker:

1. Controllers send too many activations to an invoker.

2. In the invoker, the container pool sends a Run message to a container 
proxy but the container proxy fails to process it properly and hands it 
back to the container pool. Examples: a Run message arrives while the 
proxy is already removing the container; if concurrency>1, the proxy 
buffers Run messages and returns them in failure situations.

Although I'm not 100% sure, I see more indications for reason 1 in our 
logs than for reason 2.

Regarding hypothesis "#controllers * getInvokerSlot(invoker user memory 
size) > invoker user memory size": I can rule out this hypothesis in our 
environments. We have "#controllers * getInvokerSlot(invoker user memory 
size) = invoker user memory size". I provided PR [1] to be sure about 
that.

Regarding hypothesis "invoker simply pulls too many Run messages from 
MessageFeed". I think the part you described is perfectly right. The 
questions remains why controllers send too many Run messages or a Run 
message with an activation that is larger than free memory capacity 
currently available in the pool.

The load balancer has a memory book-keeping for all of its invoker shards 
(memory size determined by getInvokerSlot()) so the load balancer is 
supposed to only schedule an activation to an invoker if the required 
memory does not exceed controller's shard of the invoker. Even if 
resulting Run messages arrive on the invoker in a changed order, the 
invoker's shard free memory should be sufficient.

Do you see a considerable number of "Rescheduling Run message" log entries 
in your environments?

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fpull%2F4520data=02%7C01%7Ctnorris%40adobe.com%7Ca7b761bd61e943c82fd308d701211f37%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636979114118405554sdata=tRnHZ%2FN2bXgR4fWSIhvdrzCAvNmPX%2FW%2BY4BwwmVFKl0%3Dreserved=0


Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-l...@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294






Re: exporting activation arguments to the environment

2019-06-27 Thread Tyson Norris
BTW, circling back on "can it be done just using -p in current form": I still 
like the idea that:
- action configured params are different the user specified params (and we 
should only pass them on init, not run)
- this is also a bigger convention change for action developers (read from env 
or context instead of the single args object given to main)

But - *adding* the -e, instead of revising the meaning of -p does makes this a 
backwards compatible change, so that's a good thing.

Thanks
Tyson

On 6/27/19, 5:00 AM, "Rodric Rabbah"  wrote:

Thanks for the added feedback - keep it coming!

Tyson is right in that we shouldn't let implementation concerns affect a
proper design and sacrifcie the experience.

So in line with his concerns, is there a desire to facilitate environment
variables at all (user specified values, not system/context ones)?

If so, then having thought about it some more, annotating every parameter
could be done as suggested say with a -e vs a -p.

-r




Re: exporting activation arguments to the environment

2019-06-26 Thread Tyson Norris
For the incremental change I would suggest:
- include the action-configured params as args to init
- optionally: include an invoker flag to optionally remove action-configured 
params from being sent to run (these would only be available by way of init 
setting env vars, or exposing some other object)
- runtimes can be updated incrementally to support this flagged invoker behavior

I understand the point on amount of work, I'm just not wanting to sacrifice an 
awkward behavior (different treatment for params in upper case) for sake of 
time. There are some accumulating issues around init and run so I'm not sure it 
is worth making the problem worse before addressing the other issues. I think 
its ok to change init to receive these params as a minimal change, but 
obviously without additional changes either at invoker or in OW's convention 
for designing functions (main sig, how to access init vars, context) there will 
be some incremental pain as well. 

Still curious what others think.


On 6/26/19, 11:27 AM, "Rodric Rabbah"  wrote:

> Sorry, it still seems controversial to me, not sure how others feel?

That's why we discuss on the dev list :D Thanks for the feedback so far.

> can you confirm this is decided based on the case of the parameter name?

Indeed, we need some rule to then partition the parameter list. Using the
convention that the env var starts with a capital letter is one. Other
conventions are plausible.

> adding a '-e' flag that specifically does "set these environment
variables"

Sure - but this increases the complexity of implementation significantly
for not a lot of gain. To add a -e, we'd need to modify the schema for
actions. For example, we could add annotation for each parameter name to be
treated as an environment variable using the existing annotations, and use
these annotations as the criteria. We could create a new field in the
actions object to hold the parameters (a schema change). We could annotate
each parameter (also a schema change).

Since a developer already controls the names of their parameters today,
they have complete control over this partitioning.

If we're open to schema changes, then we can explore a cleaner
implementation but an incremental approach that at least makes the feature
available incrementally would also make sense since making a schema change
is a lot more invasive, coupled with a few changes needed at the invoker
level plus all the runtimes.

-r






Re: exporting activation arguments to the environment

2019-06-26 Thread Tyson Norris
Sorry, it still seems controversial to me, not sure how others feel?

To be clear, when you added "-a  partition-arguments true", the result is 2 
things:
1. some of the -p args are now treated different than others - can you confirm 
this is decided based on the case of the parameter name?
2. init receives these params (which sounds good to me). 

Regardless of opting in to this behavior, having action-configured parameters 
referenced differently based on the name of the param seems bad.  I understand 
there are some useful conventions defining these as env vars, but my point is 
that this doesn't seem at all like an explicit choice. I think an explicit 
choice would be more like adding a '-e' flag that specifically does "set these 
environment variables", instead of overloading the '-p' flag with a convention 
based on the name of the variable.

Thanks
Tyson


On 6/26/19, 10:43 AM, "Rodric Rabbah"  wrote:

Maybe this got missed, but here's how I conceived of this. I'll use wsk CLI
commands, I think it makes it obvious.

wsk action create myAction code.js -p MY_ENV true -p some_param false -a
partition-arguments true

The annotation (partition-arguments) makes it explicit for the developer to
control whether "main" receives the arguments as they do today, which is
this object
{ MY_ENV: true, some_param: false}, or when the annotation is true, {
some_param: false} and process.env.MY_ENV is set to true.

I don't think there's anything confusing about this in that the developer
has decided what variables to export to the environment, and is making an
explicit choice.

Environment variables on a number of platforms are restricted to those at
consist of words that start with capital letter (AWS, Netlify as two prime
examples).

The alternative, today, requires a function to export any variables from
"main" to the environment. So it would explicitly export MY_ENV to the
environment. The change we're discussing frees the programmer from having
to do that.

The change to the runtime proxies would be 1. accept an additional value on
/init and export all the properties it contains to the environment.

Before I address the POST invoke issue, I'd like to make sure my
explanation is clearer and if this is still controversial.

    -r
    
    
On Wed, Jun 26, 2019 at 1:21 PM Tyson Norris 
wrote:

> Are you saying one is exported to environment, during init, based on
> parameter name being UPPER case? Forgetting use of env vars for a minute,
> this seems confusing to treat parameters different based on names. I would
> rather see either a) all action-configured params sent to init only, and
> never to run or b) all action-configured params sent to run as context
> object.
>
> What the runtime does at init (use env vars or not) can be different per
> runtime, but in the action-configured parameter case I don't see any
> problem with setting env vars, except that there seems to be a convention
> in some cases that allows invoking clients to "override" these values 
using
> POST parameters at invocation time. This also seems confusing but could
> also be enforced differently by various runtimes, although ideally I would
> rather see the convention change to: action-configured parameters are
> always sent to init, and always visible to run, regardless of what client
> sends as execution parameters.
>
> Thanks
> Tyson
>
>
> On 6/25/19, 3:32 PM, "Rodric Rabbah"  wrote:
>
> Context and Knative I view as orthogonal.
>
> That is, for the context object, it is another way of encapsulating
> arguments. It doesn’t export variable to the process environment.
>
> You can provide an action with both environment variables, arguments
> to main, and a context object. They are orthogonal.
>
> For the context object, the distinction that was necessary from
> previous discussions was related to separating intra container concurrent
> executions. If the system-provided context is exported to the environment
> as it today the values clobber each other. For this, the context object
> would make sense.
>
> I’m simply talking about two parameters wsk ... “-p a A” and “-p B b”
> say where one becomes exported to the environment as B=b and the other is
> passed to the action as ({a:A}).
>
> I’m going to set the knative discussion aside because I think it’s a
> distraction. With knative you can bind environment variables to the
> container. As you would with any other container.
>
> I think it’s to

Re: exporting activation arguments to the environment

2019-06-26 Thread Tyson Norris
Are you saying one is exported to environment, during init, based on parameter 
name being UPPER case? Forgetting use of env vars for a minute, this seems 
confusing to treat parameters different based on names. I would rather see 
either a) all action-configured params sent to init only, and never to run or 
b) all action-configured params sent to run as context object. 

What the runtime does at init (use env vars or not) can be different per 
runtime, but in the action-configured parameter case I don't see any problem 
with setting env vars, except that there seems to be a convention in some cases 
that allows invoking clients to "override" these values using POST parameters 
at invocation time. This also seems confusing but could also be enforced 
differently by various runtimes, although ideally I would rather see the 
convention change to: action-configured parameters are always sent to init, and 
always visible to run, regardless of what client sends as execution parameters.

Thanks
Tyson 


On 6/25/19, 3:32 PM, "Rodric Rabbah"  wrote:

Context and Knative I view as orthogonal. 

That is, for the context object, it is another way of encapsulating 
arguments. It doesn’t export variable to the process environment. 

You can provide an action with both environment variables, arguments to 
main, and a context object. They are orthogonal.

For the context object, the distinction that was necessary from previous 
discussions was related to separating intra container concurrent executions. If 
the system-provided context is exported to the environment as it today the 
values clobber each other. For this, the context object would make sense. 

I’m simply talking about two parameters wsk ... “-p a A” and “-p B b”  say 
where one becomes exported to the environment as B=b and the other is passed to 
the action as ({a:A}). 

I’m going to set the knative discussion aside because I think it’s a 
distraction. With knative you can bind environment variables to the container. 
As you would with any other container.

I think it’s too simplistic to say knative has a single endpoint. After all 
there are readiness probes and possible pre/post start hooks that operators may 
have to deal with. Init can be viewed as the readiness probe.

Fundamentally I believe the actor model is much better aligned with the 
reactive programming model for functions so this will tend toward a completely 
different discussion in my view.

The reason my proposal sets the environment variables at init time is 
that’s how env vars work;  they exist before you start you process. While they 
don’t need to be immutable, it makes sense to test them as such. 

For webaction parameters that one would export to an environment, they are 
already immutable and cannot be overridden. So really you would not use them 
for anything that varies per activation.

The view here is that you can export global (immutable) variables to the 
action. This makes it easier to take existing code and containers which might 
use env vars and use them almost off the shelf. 

-r

> On Jun 25, 2019, at 6:07 PM, Tyson Norris  
wrote:
> 
> I had to read this several times, but have some suggestions. I think when 
you say "action's arguments", you mean action-configured params, e.g. `wsk 
action create --param p1 v1`?
> 
> My preferences would be:
> - we should split off "run" args into context and params - this is the 
convention change for redefining main(args) as main(context, args) we have 
discussed in the past. 
> - I support either having init receive action-configured params 
> - activation args that are possibly overridden should behave exactly as 
specified args - is it important that action-configured args are actually 
overridden, if the context and params are separated? (receive both values, and 
logic must decide when to use which)
> - let's not use env variables for any arg that is variable per activation 
- it is impossible if you support concurrency, and unneeded if we pass the 
context to "run". 
> 
> Regarding Matt's suggestion to remove init - I like this idea, but I have 
concerns compared to knative which might serve every function with a different 
container, vs having some containers reused for multiple functions. In the case 
where we init code into an already running container, it is useful to have the 
init process separate from run, since otherwise each runtime will need to track 
its own init state and queue requests during init etc. If I'm not getting the 
whole picture with knative, please correct me.
> 
> 
> Thanks
> Tyson 
> 
> On 6/24/19, 8:43 AM, "Rodric Rabbah"  wrote:
> 
>In the current activation model, an action's arguments are always 
pr

Re: exporting activation arguments to the environment

2019-06-25 Thread Tyson Norris
I had to read this several times, but have some suggestions. I think when you 
say "action's arguments", you mean action-configured params, e.g. `wsk action 
create --param p1 v1`?

My preferences would be:
- we should split off "run" args into context and params - this is the 
convention change for redefining main(args) as main(context, args) we have 
discussed in the past. 
- I support either having init receive action-configured params 
- activation args that are possibly overridden should behave exactly as 
specified args - is it important that action-configured args are actually 
overridden, if the context and params are separated? (receive both values, and 
logic must decide when to use which)
- let's not use env variables for any arg that is variable per activation - it 
is impossible if you support concurrency, and unneeded if we pass the context 
to "run". 

Regarding Matt's suggestion to remove init - I like this idea, but I have 
concerns compared to knative which might serve every function with a different 
container, vs having some containers reused for multiple functions. In the case 
where we init code into an already running container, it is useful to have the 
init process separate from run, since otherwise each runtime will need to track 
its own init state and queue requests during init etc. If I'm not getting the 
whole picture with knative, please correct me.


Thanks
Tyson 

On 6/24/19, 8:43 AM, "Rodric Rabbah"  wrote:

In the current activation model, an action's arguments are always provided
to the action on "run", not "init".

Should we consider partitioning the argument list into two sets, the first
is exported as environment variables at "init" time, and the second become
the action's argument at "run" time? A criteria for partitioning is that
the environment variable starts with a capital letter, which is a common
convention.

For example, an action which is invoked with a JSON object

{ "XYZ": true,
  "abc" : false }

would receive {"abc": false} as its arguments and can read XYZ from the
environment (as process.env.XYZ == "true" in Node.js).

This change would:
1. require a change in the invoker to pass arguments during initialization

2. require a change in the runtime proxies to export the arguments to the
environment at initialization time (additional work may be implied by 1b)

3. an annotation on actions to opt into this partitioning for backward
compatibility or to opt out. For example '-a env-partition-arguments true'
partitions the arguments and actions without this annotation are not
affected.

Some obvious question:
Q1a. should the invoker perform the partitioning or delegate it to the
runtime? The advantage of the former is that the runtimes do not have to
implement the filtering policy and do less work. I think it makes sense to
do this invoker side for uniformity.

Q1b. should the partitioning treat environment variables as immutable post
init and ignore the partition on warm starts? This is an issue when a value
is overridden during POST invoke only since for a webaction, you cannot
override a value that's already defined (and updating a bound parameter on
an action invalidates warm containers). I think env vars should be treated
as immutable despite the issue with POST invoke.

-r




Re: Re: Backpressure for slow activation storage in Invoker

2019-06-21 Thread Tyson Norris
The logs issue is mostly separate from the activation records. 
RE activation records:
Can we handle these in same way as user events? Maybe exactly like user events, 
as in use a single service to process both topics.


RE logging:
We deal with logs this way (collect from container via fluent), but the 
problems we've seen are:
- forcing structure isn't easy to enforce consistently on all runtimes (we get 
around this by mostly only supporting nodejs, java may also not be too bad, if 
you support a few common logging libs)
- there are separate issues with concurrency support - for nodejs we use 
cls-hooked to manage this context; using environment variables instead of a 
different main signature is also a problem for concurrency, and also can be 
worked around with cls-hooked. Not sure other operators care about this so 
maybe it is not an issue except to be documented per runtime.
- many tests currently assume that the logs are immediately available, and this 
cannot be the case with decoupled log collection (I've started work to add some 
retries/delay tolerance, but it's incomplete)

Thanks
Tyson


On 6/21/19, 10:29 AM, "David P Grove"  wrote:




Rodric Rabbah  wrote on 06/20/2019 09:37:38 PM:
>
> Overflowing to Kafka (option b) is better. Actually I would dump all
> the activations there and have a separate process to drain that
> Kafka topic to the datastore or logstore.

I agree. Spilling to Kafka is desirable to avoid OOMs in the invoker.

> There is another approach of routing the logs directly to a logstore
> without going through the invoker at all. IBM may have experimented
> with this maybe someone else can comment on that.

In the Kubernetes world (especially with the KubernetesContainerFactory),
this is the only really good way of doing it.

To really do this well, our actions should be required to implement
structured logging.  If every log line had the activationId and namespace
info in it, then the logs could stream from the container through an
efficient OpenWhisk-specific logging agent (I had prototyped an agent using
fluent bit last year) to the platform logging service.

If you don't have structured logging from the actions, you can try to
kludge this flow together in various ways but it gets messier.

--dave




Re: Change the way the java runtime handles envirorment variable

2019-05-02 Thread Tyson Norris
FWIW, I don't think of this as a huge change, meaning no change is required at 
invoker/controller. Rather it is just a convention for context value access 
within functions and function signature that is already unique to each language 
runtime, and requires possibly supporting 2 runtimes per language+convention 
while transitioning to the new convention.  This convention is exposed to 
function developers so "replacing it" is not really an option, I think, but 
rather requires graceful deprecation and migration (with developers 
participation). 

e.g. for nodejs, the old convention used for signature and context is:
- function signature is ` (args)`
- context is `process.env.`

New convention for signature and context is:
- function signature is ` (contextObject, args)`
- context is ` contextObject.`

So you can feasibly:
* create a new runtime for nodejs that uses new convention
* add a separate kind for using the new runtime
* gradually phase out use of old kind (communication, deprecation, etc)

I just want to throw this out there because I get the feeling whenever we 
discuss "context" that there is misconception around the need to change the 
invoker to support this. There is surely some parts of the invoker/action 
container protocol that may be further cleaned up to isolate static values 
(action name) from per-activation context (activation id), but I don't think 
that is required to start changing the conventions for function signature to 
include context as a separate object from activation params.

Thanks
Tyson


On 5/2/19, 8:54 AM, "David P Grove"  wrote:



Rodric Rabbah  wrote on 05/02/2019 11:05:35 AM:
>
> any existing java function that uses the environment variables would have
> to be modified.
> i would not change it - openwhisk has a uniform model across all the
> runtimes and this would start to diverge... i can be convinced but
instinct
> is to leave it alone.
>

I think the change could be justified if it was part of a broader redesign
to enable concurrent activations in a runtime.
1. pass in a context object instead of stashing activation-specific
params in the environment
2. require proper structured logging

--dave




Re: Tech Interchange meeting Notes & Video posted

2019-03-06 Thread Tyson Norris
Thanks Matt + Rodric
I opened the invoker clustered resource PR here: 
https://github.com/apache/incubator-openwhisk/pull/4326
And added a wiki page with diagram and presentation attached here: 
https://cwiki.apache.org/confluence/display/OPENWHISK/Cluster+Manager+Resources+in+Invoker

Thanks
Tyson

On 3/6/19, 10:17 AM, "Rodric Rabbah"  wrote:

Thanks Matt - I made an editing pass on the notes. Amazing that you can
capture all that, everytime!

-r

On Wed, Mar 6, 2019 at 12:26 PM Matt Rutkowski 
wrote:

> Thanks Rodric for hosting with some very interesting and exciting topics
> from Tyson and Rodric himself and looking forward to enabling more
> deterministic Container pool mgmt. along with exploring the browser-based
> paradigms Rodric demonstrated with Digital Ocean droplets.
>
> Video: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F49iyv_H4WUsdata=02%7C01%7Ctnorris%40adobe.com%7C20923818a13843ca7a3a08d6a2600ba3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636874930742966980sdata=YDq3qAsdyhZgeUKrQ8LMPYS72D52ElXNmVOcE8KLafk%3Dreserved=0
> Notes:
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2F2019-03-06%2BOW%2BTech%2BInterchange%2B-%2BMeeting%2BNotesdata=02%7C01%7Ctnorris%40adobe.com%7C20923818a13843ca7a3a08d6a2600ba3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636874930742966980sdata=eoCzNf7B6CXnwHrMgM3rBdRjlPH6ZAmwTXqzmnkBY7I%3Dreserved=0
>
> Dragos kindly volunteered to moderate the next meeting on the 20th of
> March.
>
> !!! PLEASE BE AWARE OF TIMEZONE DISCREPANCIES BETWEEN U.S. AND OTHER GEOS
> for this  next sched. meeting until April when the delta returns to normal
> !!!
>
> -mr
>




Re: [Bi-Weekly Tech Interchange] Call for agenda topics

2019-03-04 Thread Tyson Norris
I would like to share details on a forthcoming PR to enable better invoker 
integration with cluster managed (mesos/k8s) action containers. I don't have 
the PR open yet, but plan to by sometime tomorrow. 
Unfortunately I have a Dr appt on March 20, so cannot volunteer - but possibly 
April 3rd. 

Thanks
Tyson

On 3/4/19, 8:16 AM, "Rodric Rabbah"  wrote:

Hi, our next *tech interchange* is in two day.s If there is a topic you'd
like to discuss with the community, or some work you'd like to share,
please reply to this email so that I can add you to the agenda.

Bi-weekly Tech Interchange call details:
- Zoom: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Ctnorris%40adobe.com%7C076ba1d0ef344afd30d708d6a0bcd131%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636873130153557267sdata=UGCob9juU6Bv2sgzO548p86fA%2BWwSsw%2Fv6kfQWTOhNE%3Dreserved=0
- Wednesday March 6
11AM EST(Eastern US)
5PM CET (Central Europe),
4PM UTC, 12AM CST (Beijing)

Bonus: If you volunteer early to host the March 20 call, you will win a
prize of distinction.

-r




Re: change the default action context to omit api key

2019-02-13 Thread Tyson Norris
I agree the api_key is bad, when not using e.g. OW npm within the action. +1 
for using an annotation to enable this.

activation_id is required to do the right thing for logging with concurrency 
enabled - but I'm also not sure what risk it is to include that? It will be in 
the response header anyways still right?

Namespace + action - similar to activation_id, this is already available to the 
client and may have some convenience for action devs (especially with logging 
concurrent actiavitons __ )

From my perspective, I would just change the api_key to be explicitly passed, 
and leave the rest as-is.

Thanks
Tyson

On 2/13/19, 1:09 PM, "Rodric Rabbah"  wrote:

Hi,

I'm looking for feedback on the following issue:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fissues%2F4226data=02%7C01%7Ctnorris%40adobe.com%7C549eb49aa3e04739078e08d691f78c5b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636856889729887501sdata=hcDnwNeHsOPcrmwQ3YRC1lSGmYBvre0WIpN5lGdmqFk%3Dreserved=0

Actions receives the API key in the environment even if it is not
necessary. This should not be the default behavior. With the issue I'm
proposing that we flip the default and provide an annotation on the action
to enable the key forwarding to preserve existing behavior.

Additionally We currently created the following context:
{
   "api_host": process.env['__OW_API_HOST'],
   "api_key": process.env['__OW_API_KEY'],
   "namespace": process.env['__OW_NAMESPACE'],
   "action_name": process.env['__OW_ACTION_NAME'],
   "activation_id": process.env['__OW_ACTIVATION_ID'],
   "deadline": process.env['__OW_DEADLINE']
}


https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fblob%2Fda21c9fe49b2ae72c95b6866b30d984c65253724%2Fcore%2Finvoker%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fopenwhisk%2Fcore%2Fcontainerpool%2FContainerProxy.scala%23L565-L571data=02%7C01%7Ctnorris%40adobe.com%7C549eb49aa3e04739078e08d691f78c5b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636856889729887501sdata=qCfYzoSy%2BpAJAAC%2FDFBX%2Fu4NDkccE96eCbRvMwkvP9E%3Dreserved=0

Should we hide the namespace, action name and activation id as well?

-r




concurrency tracking PR

2019-01-30 Thread Tyson Norris
Hi –
I have a PR #4186 ready to merge that has some changes to critical areas of 
ContainerPool for tracking concurrency; specifically so that initing containers 
(not yet warm) can receive activation jobs, instead of current behavior where 
additional containers will launch. While this only applies to actions that 
support intra-container concurrency, the changes affect the general 
ContainerProxy state management in ContainerPool.
Please review if you have concerns, or forever hold your peace (or until the 
next PR )
Chetan has already reviewed, but it would be great to have another pair of eyes 
there.
https://github.com/apache/incubator-openwhisk/pull/4186

Thanks!
Tyson


Re: Update KindRestrictor to merge namespace and default whitelists

2018-11-28 Thread Tyson Norris
Hi - 
We haven't heard any feedback on this, so will plan to merge this change today. 

Thanks
Tyson

On 11/14/18, 6:48 PM, "Andy Steed"  wrote:

Hello Whiskers!


I wanted to bring up a potentially breaking change to existing 
functionality for the KindRestrictor. Given how recently this functionality was 
added, I do not expect this to affect others. However, I would like to have any 
concerns or objections raised with the group sooner than later.


First, to give a little backstory. Previously, a feature was 
added
 to support limiting the kinds that were allowed to be used when creating 
actions for a given namespace, as well as support for setting a default 
whitelist for the overall system. In the initial feature the whitelist set per 
namespace (if present) was used instead of the default whitelist.


Now, I have opened a PR to adjust this 
logic,
 such that the explicit namespace limit for allowedKinds is viewed as a 
complement to the default system whitelist (if present). I have provided 
additional specifics in the PR description including reasoning behind the need 
for this change.


Cheers,

Andy






Re: [VOTE] Release Apache OpenWhisk 0.9.0-incubating-rc1: OpenWhisk Composer

2018-11-26 Thread Tyson Norris
+1 to release Apache OpenWhisk 0.9.0-incubating: OpenWhisk composer

Verified checklist

Thanks
Tyson

On 11/20/18, 11:30 AM, "David P Grove"  wrote:



This is call for a vote for the release of Apache OpenWhisk
0.9.0-incubating: OpenWhisk composer

List of JIRA ticket(s) resolved for this release can be found at

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FINCUBATOR-227data=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=c5Vvh8FCPIJEXP8VbVAwfpp7mCWwyvDb28s6h2X55GA%3Dreserved=0.

To learn more about Apache OpenWhisk, please visit

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenwhisk.apache.org%2Fdata=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=Hf90uR%2BNrbuExRoPCJNwzTopx1So7bpIA4hyC46KgZM%3Dreserved=0

This release comprises of source code distribution only. There is only one
module within this release number 0.9.0. The artifact were built from the
following Git commit IDs:
* openwhisk-composer: 7ae7f08,

The source code artifact of openwhisk composer can be found at:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fincubator%2Fopenwhisk%2Fapache-openwhisk-0.9.0-incubating-rc1%2Fopenwhisk-composer-0.9.0-incubating-sources.tar.gzdata=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=ZtPxn78DZc5VkDVj%2B5%2FObEH%2FU08%2BbL4DRh7x%2FvFWFzk%3Dreserved=0

The SHA-512 checksum for the artifact of openwhisk composer is
openwhisk-composer-0.9.0-incubating-sources.tar.gz:
7D5ECB65 DF653840 C8A33C7B DDF346AD 2AA36507 DD1D6DE6 8CB99238 BDC93EF6
425B5BAF
 1371C4BE 6F1E0DEF 01D60D18 03AADC33 B0B7BD95 40724450 5FF6D131
which can can be found via:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fincubator%2Fopenwhisk%2Fapache-openwhisk-0.9.0-incubating-rc1%2Fopenwhisk-composer-0.9.0-incubating-sources.tar.gz.sha512data=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=NNtivScNG4YM7izTvJzoX%2FrZAMHyR0vuScc8usOTa%2Fc%3Dreserved=0

The signature of the artifact of openwhisk composer can be found via:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fincubator%2Fopenwhisk%2Fapache-openwhisk-0.9.0-incubating-rc1%2Fopenwhisk-composer-0.9.0-incubating-sources.tar.gz.ascdata=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=bDNdi6idFpGjgcDIv%2BGjmTgZ0EyyCz2HE4HVCCR7qMo%3Dreserved=0

KEYS file is available here:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fincubator%2Fopenwhisk%2FKEYSdata=02%7C01%7Ctnorris%40adobe.com%7Cba599811e5d045bea45408d64f1ea215%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636783390319129809sdata=c%2BGNCmz7HSH4v2rJp901wUbdjokQw8K6EsUl%2FBGeOvs%3Dreserved=0


Please vote on releasing this package as Apache OpenWhisk 0.9.0-incubating:
OpenWhisk Composer

The vote will be open for at least 72 hours.
[ ] +1 Release as Apache OpenWhisk 0.9.0-incubating: openwhisk composer
[ ] +0 no opinion
[ ] -1 Do not release and the reason

Checklist for reference:
[ ] Download links are valid.
[ ] Checksums and PGP signatures are valid.
[ ] DISCLAIMER is included.
[ ] Source code artifacts have correct names matching the current release.
[ ] LICENSE and NOTICE files are correct for each OpenWhisk repository.
[ ] All files have license headers if necessary.
[ ] No compiled archives bundled in source archive.

regards,

--dave




system tests with external logstore

2018-10-08 Thread Tyson Norris
Hi –
We are successfully using the openwhisk system tests against our deployments, 
but as we near deployments with our external log store, there is the conflict 
that current tests expect:

  *   Logs to be returned with ‘wsk activation get’
  *   Logs to be available immediately

To deal with this, I’ve been considering adding some config to tests to allow:

  *   Boolean flag to signal that logs are only available at `wsk activation 
logs`
  *   Integer config to delay access to those logs (for whatever amount of time 
is suitable for the logstore)

WDYT?
One question is – if there is a notion that at some point ‘wsk activation get’ 
should not be relied on for logs, how do we get to the point where the ‘wsk 
activation get’ api stops including logs? If there is a path to changing this 
API, we can deprecate it, change the tests to use `wsk activation logs` without 
any config flag to signal it. It’s not clear how important this is to folks, so 
I am assuming that making it configurable for test runs would be the best 
possible case for now, but let me know if you think different.

Longer term, I’m also wondering if we can make the delay transparent to users, 
but leveraging the “start sentinel” that Dave has started at [1]. In that case, 
assuming the “start” AND “end” sentinels are sent to log store, the LogStore 
impl can determine the difference between “no logs here yet” and “logs 
received, but there was only sentinels (no user logs)”. Currently our runtime 
does NOT include sentinels at all, so devs are left to “reload” logs requests, 
if they are expecting logs to show up. In our case it is pretty fast, few 
seconds, but still not super experience.

Thanks
Tyson

[1] https://github.com/apache/incubator-openwhisk/pull/3974


Re: Logstore usage during `activation get`

2018-10-02 Thread Tyson Norris
By "break this" do you mean at some point we should remove the logs from the 
GET?
In any case I will close the PR.

Thanks
Tyson

On 10/2/18, 4:21 PM, "Rodric Rabbah"  wrote:

Hi Tyson - this was the intent of the API design: there is a separate 
resource for LOGS and the RESULT. The reasoning also that the metadata is 
typically small but the logs could be much larger. Separating the two was also 
intended for easier streaming of the responses.

Because of implementation made it easier to bundle the response, we have 
the current “feature” where GET on the activation id returns the entire record. 
I think we can break this because the clients can sugar the underlying calls.

-r

> On Oct 2, 2018, at 12:07 PM, Tyson Norris  
wrote:
> 
> Hi –
> I created this PR [1] due to noticing that `wsk activation get` does NOT 
return logs from a LogStore which stores logs outside of the Activation entity.
> But it bring up a question of: Does IBM or any other operator who might 
use a custom LogStore desire to have logs included with `activation get`?
> Currently returning logs is only possible using `wsk activation logs`
> 
> Personally, I think it is “nice” to have a separate explicit request for 
logs and activation metadata, and this is the way that the current OW 
Activation API operates with regards to an external LogStore (splunk, elk, 
etc), but after all it is inconsistent from the case where logs are NOT using 
external storage.
> 
> WDYT?
> 
> Thanks
> Tyson
> 
> [1] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fpull%2F4044data=02%7C01%7Ctnorris%40adobe.com%7C30a7422c5e8f4184015408d628bdcb36%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636741192954948579sdata=uvYACmWD3PWVl3pWzCZpzZer1jVFhdpv8pc%2Fz0Bh7z4%3Dreserved=0





Re: Travis problem with runtime-docker

2018-10-02 Thread Tyson Norris
Finally tested docker-machine and see the diff with docker-machine vs docker 
for mac; same test running on my local mac, fails in one but not the other. 
Closing that PR for now, guessing this may be a subtle diff in the docker 
daemon version - at least the working version is in main branch already, so my 
PR is not needed...


On 10/1/18, 5:12 PM, "Tyson Norris"  wrote:

Of course I just noticed when the `./gradlew install` is fast, is because 
the openwhisk/tools/travis/setup.sh script does ./gradlew 
:tests:compileTestScala, which is indeed much slower. 

I also noticed the gradle versions are different between main repo and 
runtime-docker, so will try out updating runtime-docker to same gradle 
version...

On 10/1/18, 4:45 PM, "Tyson Norris"  wrote:

Hi –
I’m troubleshooting a problem with this PR build [1] where a test fails 
due to a difference in exception handling between akka http client (now used in 
runtime test base class), and apache http client. This change was merged to OW 
master some time ago in [2].

The problem I’m having is that the builds and test work great locally. 
The travis build is behaving as if the tests base classes (built by cloning 
incubator-openwhisk, and running `./gradlew install`) are not installed fresh 
each build, but I cannot see anywhere in travis config or build scripts where 
they might be cached. One reason I suspect cacheing (aside from the build 
failing) is that the elapsed time spent in `./gradlew install` is 15s in travis 
– this is way faster than I ever see running the same on a clean system locally.

Any ideas?

Thanks
Tyson

[1] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fapache%2Fincubator-openwhisk-runtime-docker%2Fbuilds%2F435891993data=02%7C01%7Ctnorris%40adobe.com%7Ce14b614b2a9e453e1fa408d627fbc0a7%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636740359580386259sdata=UoG9%2BAdazl0VyX%2Ff5Rk%2Bt1FO3rgqiXyhRhRNOZWAXj8%3Dreserved=0
[2] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fcommit%2F15bb04a449f621d262c2687a7b8417241f3856b8data=02%7C01%7Ctnorris%40adobe.com%7Ce14b614b2a9e453e1fa408d627fbc0a7%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636740359580386259sdata=3nY6p6KaTPIsB6cSKw3rlFSjI1MZsBIr24wkav17dl4%3Dreserved=0







Logstore usage during `activation get`

2018-10-02 Thread Tyson Norris
Hi –
I created this PR [1] due to noticing that `wsk activation get` does NOT return 
logs from a LogStore which stores logs outside of the Activation entity.
But it bring up a question of: Does IBM or any other operator who might use a 
custom LogStore desire to have logs included with `activation get`?
Currently returning logs is only possible using `wsk activation logs`

Personally, I think it is “nice” to have a separate explicit request for logs 
and activation metadata, and this is the way that the current OW Activation API 
operates with regards to an external LogStore (splunk, elk, etc), but after all 
it is inconsistent from the case where logs are NOT using external storage.

WDYT?

Thanks
Tyson

[1] https://github.com/apache/incubator-openwhisk/pull/4044


Re: Travis problem with runtime-docker

2018-10-01 Thread Tyson Norris
Of course I just noticed when the `./gradlew install` is fast, is because the 
openwhisk/tools/travis/setup.sh script does ./gradlew :tests:compileTestScala, 
which is indeed much slower. 

I also noticed the gradle versions are different between main repo and 
runtime-docker, so will try out updating runtime-docker to same gradle 
version...

On 10/1/18, 4:45 PM, "Tyson Norris"  wrote:

Hi –
I’m troubleshooting a problem with this PR build [1] where a test fails due 
to a difference in exception handling between akka http client (now used in 
runtime test base class), and apache http client. This change was merged to OW 
master some time ago in [2].

The problem I’m having is that the builds and test work great locally. The 
travis build is behaving as if the tests base classes (built by cloning 
incubator-openwhisk, and running `./gradlew install`) are not installed fresh 
each build, but I cannot see anywhere in travis config or build scripts where 
they might be cached. One reason I suspect cacheing (aside from the build 
failing) is that the elapsed time spent in `./gradlew install` is 15s in travis 
– this is way faster than I ever see running the same on a clean system locally.

Any ideas?

Thanks
Tyson

[1] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fapache%2Fincubator-openwhisk-runtime-docker%2Fbuilds%2F435891993data=02%7C01%7Ctnorris%40adobe.com%7Cdc85e3274e6e46f2524408d627f7eeb2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636740343143609349sdata=4%2FOZEvMg04O%2BfmGRvvv9QOM0qS4Np6uQDTO%2Fnxi7LSg%3Dreserved=0
[2] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fcommit%2F15bb04a449f621d262c2687a7b8417241f3856b8data=02%7C01%7Ctnorris%40adobe.com%7Cdc85e3274e6e46f2524408d627f7eeb2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636740343143609349sdata=dzr76uuIp0DYLF4e%2FpqYvMxUKg5L2lFNTFqzBBlDKyA%3Dreserved=0





Travis problem with runtime-docker

2018-10-01 Thread Tyson Norris
Hi –
I’m troubleshooting a problem with this PR build [1] where a test fails due to 
a difference in exception handling between akka http client (now used in 
runtime test base class), and apache http client. This change was merged to OW 
master some time ago in [2].

The problem I’m having is that the builds and test work great locally. The 
travis build is behaving as if the tests base classes (built by cloning 
incubator-openwhisk, and running `./gradlew install`) are not installed fresh 
each build, but I cannot see anywhere in travis config or build scripts where 
they might be cached. One reason I suspect cacheing (aside from the build 
failing) is that the elapsed time spent in `./gradlew install` is 15s in travis 
– this is way faster than I ever see running the same on a clean system locally.

Any ideas?

Thanks
Tyson

[1] 
https://travis-ci.org/apache/incubator-openwhisk-runtime-docker/builds/435891993
[2] 
https://github.com/apache/incubator-openwhisk/commit/15bb04a449f621d262c2687a7b8417241f3856b8



Re: Bi-weekly Tech Interchange call tomorrow - add agenda topics here

2018-09-11 Thread Tyson Norris
Hi - 
I'd like to discuss the pending concurrency PR here:
https://github.com/apache/incubator-openwhisk/pull/2795

Specific controversial topics:
- adding a synchronized block during concurrency>1 action scheduling when the 
activation requires a new container
- not adding indication of which runtimes support concurrency (leave it to 
operator config for now)

Thanks
Tyson

On 9/11/18, 2:41 PM, "Matt Rutkowski"  wrote:

Hi Whiskers!

Please add to this thread (or send to Markus directly, our guest host) any 
agenda items you would like to discuss at the Tech Interchange call 
tomorrow

Call details:

Web Meeting: Tech Interchange (bi-weekly):
- Day-Time: Wednesdays, 11AM EDT (Eastern US), 5PM CEST (Central Europe),
3PM UTC, 11PM CST (Beijing)
- Zoom: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhiskdata=02%7C01%7Ctnorris%40adobe.com%7Ce0222600b15947d380af08d6182f4f52%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636722988813345830sdata=kNXdcHyhzS09%2F7w44fToF1sJOFlfZ%2BXmYyvR7epYRJc%3Dreserved=0


-mr






Re: Prototyping for a future architecture

2018-08-28 Thread Tyson Norris
Changing language is a big leap, that seems unrelated to the basic design 
principles we are discussing in the proposal. 
If performance is the main concern, what type of performance difference will be 
worth this effort? It will be hard (or impossible) to measure until it's done, 
and the only guarantee is that it will be a lot longer time to done with a 
change in language. 
If ContainerManager kube cli is a concern, what is missing from the existing 
scala->kubectl integration? and is there any reason to not use the HTTP API 
instead of an sdk/cli?.

I would suggest at least getting to a point where either of these (performance, 
functionality) manifest as tangible problems in context of a new design, and 
consider changing at that time. 

Maybe I am assuming too much but it seems like the ContainerPool/ContainerProxy 
portion of invoker does not need rewriting immediately, only that a) the 
ContainerManager portion of the pool workflow needs to be managed at a cluster 
level and b) the activation processing should be handled at the ContainerRouter 
via HTTP (not kafka). Are you instead suggesting the ContainerPool be 
completely rewritten? 

Thanks
Tyson


On 8/28/18, 2:53 PM, "Rodric Rabbah"  wrote:

Thanks Michael for raising these points. I share the same opinion and
sentiment and think a branch with a clean migration story is better and
makes more sense. I am not entirely convinced that the choice of language
itself will make the difference vs the new architecture which is quite
different and should in itself be more efficient.

-r

On Tue, Aug 28, 2018 at 4:51 PM Michael Marth 
wrote:

> Hi Markus,
>
> IMHO what you propose below is a rather severe change in scope of this
> discussion and effort.
> Up until so far this was about _evolving_ the OW architecture. We have not
> explicitly discussed it, but one could assume that it is at least feasible
> to gradually adopt the new architecture. So there would be a smooth path
> between the current state of the code base and a future one.
>
> Your proposal below breaks this assumption somewhat (by proposing a new
> repo instead of a branch - which will inevitably make the 2 code bases
> drift apart) as well as explicitly by suggesting a new implementation
> language. Especially the latter would create a schism between OW-now and
> OW-future.
> This schism has implications like the perception of OW-now being
> deprecated, the _possibility_ of no clean upgrade path, the immediate 
split
> of the community between *-now and *-future and of course carries the risk
> of the version 2 syndrome.
>
> I would propose to implement the future architecture in a branch and in
> Scala first. If it turns out to be good, then subsequent experiments can
> show or not-show if a switch of language is of additional value. That 
would
> allow to make a decision based on data rather than anything else.
>
> My2c
> Michael
>
>
> On 28.08.18, 14:26, "Markus Thömmes"  wrote:
>
> Hi all,
>
> Am Mo., 27. Aug. 2018 um 20:04 Uhr schrieb David P Grove <
> gro...@us.ibm.com
> >:
>
> >
> >
> >
> > "Markus Thömmes"  wrote on 08/23/2018
> 04:19:33
> > PM:
> >
> > >
> > > Key point I want to make is: At some point we'll have to start to
> > prototype
> > > things out and see if our assumptions actually hold water. For
> example,
> > my
> > > assumption on a work-stealing backend is pretty much in the air.
> > >
> > > My proposal for going forward would be:
> > > 1. Create a playground for the implementation of some parts of the
> system
> > > (a new repository?)
> > > 2. Let's build some of the things that are uncontroversial and
> absolutely
> > > needed in any case (ContainerRouter, ContainerManager).
> > > 3. Play around with them, hook them up in different ways, see what
> works
> > > and what doesn't.
> > >
> > > Some things will need some testing out to see the scale that the
> > components
> > > can operate at. These things will narrow or widen the solution
> space for
> > > the more controversial topics around how to distribute containers
> in the
> > > system, how to balance between the routers, work-stealing queue:
> yes/no
> > etc.
> > >
> > > Having some simple components fleshed out could encourage
> innovation and
> > > creates some facts that we need to focus things into a good
> direction.
> > >
> > > What do you think? Too early to start with this and/or the wrong
> way of
> > > doing it?
> > >
> >
> > +1 for starting to prototype.  It's been a 

Bi-weekly Tech Interchange call tomorrow

2018-08-28 Thread Tyson Norris
Hi Whiskers!

Please send any agenda items you would like to discuss at the Tech Interchange 
call tomorrow.
Thanks
Tyson

Call details:
Web Meeting: Tech Interchange (bi-weekly):
- Day-Time: Wednesdays, 11AM EDT (Eastern US), 5PM CEST (Central Europe),
3PM UTC, 11PM CST (Beijing)
- Zoom: https://zoom.us/my/asfopenwhisk


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Tyson Norris
> Router is not pulling at queue for "specific actions", just for any action
> that might replace idle containers - right? This is complicated with
> concurrency though since while a container is not idle (paused +
> removable), it may be useable, but only if the action received is the same
> as one existing warm container, and that container has concurrency slots
> available for additional activations. It may be helpful to diagram some of
> this stealing queue flow a bit more, I'm not seeing how it will work out
> other than creating more containers than is absolutely required, which may
> be ok, not sure.
>

Yes, I will diagram things out soonish, I'm a little bit narrow on time
currently.

The idea is that indeed the Router pulls for *specific* actions. This is a
problem when using Kafka, but might be solvable when we don't require
Kafka. I have to test this for feasibility though.


Hmm OK - it's not clear how a router that is empty (not servicing any 
activations) becomes a router that is pulling for that specific action, when 
other routers pulling for that action are at capacity (so new containers are 
needed)




Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Tyson Norris
>
> And each ContainerRouter has a queue consumer that presumably pulls from
> the queue constantly? Or is consumption based on something else? If all
> ContainerRouters are consuming at the same rate, then while this does
> distribute the load across ContainerRouters, it doesn't really guarantee
> any similar state (number of containers, active connections, etc) at each
> ContainerRouter, I think. Maybe I am missing something here?
>


The idea is that ContainerRouters do **not** pull from the queue
constantly. They pull work for actions that they have idle containers for.

Router is not pulling at queue for "specific actions", just for any action that 
might replace idle containers - right? This is complicated with concurrency 
though since while a container is not idle (paused + removable), it may be 
useable, but only if the action received is the same as one existing warm 
container, and that container has concurrency slots available for additional 
activations. It may be helpful to diagram some of this stealing queue flow a 
bit more, I'm not seeing how it will work out other than creating more 
containers than is absolutely required, which may be ok, not sure. 

Similar state in terms of number of containers is done via the
ContainerManager. Active connections should roughly even out with the queue
being pulled on idle.

Yeah carefully defining "idle" may be tricky, if we want to achieve absolute 
minimum containers in use for a specific action at any time.


>
> The edge-case here is for very slow load. It's minimizing the amount 
of
> Containers needed. Another example:
> Say you have 3 Routers. A request for action X comes in, goes to
> Router1.
> It requests a container, puts the work on the queue, nobody steals it,
> as
> soon as the Container gets ready, the work is taken from the queue and
> executed. All nice and dandy.
>
> Important remark: The Router that requested more Containers is not
> necessarily the one that's getting the Containers. We need to make
> sure to
> evenly distribute Containers across the system.
>
> So back to our example: What happens if requests for action X are made
> one
> after the other? Well, the layer above the Routers (something needs to
> loadbalance them, be it DNS or another type of routing layer) isn't
> aware
> of the locality of the Container that we created to execute action X.
> As it
> schedules fairly randomly (round-robin in a multi-tenant system is
> essentially random) the action will hit each Router once very soon. As
> we're only generating one request after the other, arguably we only
> want to
> create only one container.
>
> That's why in this example the 2 remaining Routers with no container
> get a
> reference to Router1.
>
> In the case you mentioned:
> > it seems like sending to another Router which has the container, but
> may
> not be able to use it immediately, may cause failures in some cases.
>
> I don't recall if it's in the document or in the discussion on the
> dev-list: The router would respond to the proxied request with a 503
> immediatly. That tells the proxying router: Oh, apparently we need 
more
> resources. So it requests another container etc etc.
>
> Does that clarify that specific edge-case?
>
> Yes, but I would not call this an edge-case -  I think it is more of a
> ramp up to maximum container reuse, and will probably dramatically 
impacted
> by containers that do NOT support concurrency (will get a 503 when a 
single
> activation is in flight, vs high concurrency container, which would cause
> 503 only once max concurrency reached).
> If each ContainerRouter is as likely to receive the original request, and
> each is also as likely to receive the queued item from the stealing queue,
> then there will be a lot of cross traffic during the ramp up from 1
> container to  containers. E.g.
>
> From client:
> Request1 -> Router 1 -> queue (no containers)
> Request2 -> Router 2 -> queue (no containers)
> Request3 -> Router 3 -> queue (no containers)
> From queue:
> Request1 -> Router1  -> create and use container
> Reuqest2 -> Router2 -> Router1 -> 503 -> create container
> Request3 -> Router3 -> Router1 -> 503 -> Router2 -> 503 -> create 
container
>
> In other words - the 503 may help when there is one container existing,
> and it is deemed to be busy, but what if there are 10 containers existing
> (on different Routers other than where the request was pulled from the
> stealing queue) - do you make HTTP requests to all 10 Routers to see if
> they are busy before creating a new 

Re: Simplifying configuring OpenWhisk Components (#3984)

2018-08-23 Thread Tyson Norris
Hi Chetan - 
As mentioned in the linked DCOS issue, using a marathon "uri" for config files 
(fetched before container is started) is an option - is there any reason that 
won't work for us?
Files will end up in /mnt/mesos/sandbox in the container, not sure if that path 
matters (we can copy the files too, or else change the classpath a bit)


On 8/23/18, 2:20 AM, "Chetan Mehrotra"  wrote:

> How will this impact other deployment tools, like Docker Compose? I'm
aware
that your change keeps the old path working, but do we envision to drop
that at some point?

Docker compose can also make use of config mount to make config file
available. As to dropping support for current transformation support ...
this we can decide sometime in future once we put proposed approach in use
and see if it covers all usecase. For example one possible case can be
where container orchestrator has some way to inject credentials from a
secret store. That would work with simple env variables but tricky to get
it work with stuff embedded in a file

> Would it be valuable to have a writeup in the Wiki outlining how we'd
envision a future configuration method to work across all the deployment
methods that we have? I feel like there's lots of simplification to get
especially for cases like Mesos and/or Kubernetes, if we drop the env-var
rewriting.

Would work on a write up. k8s ConfigMap support is quite useful.
Unfortunately Mesos/Marathon does not have such a feature [2]. So we have
to make it work with cramming stuff in env variables only for now.

As of now env-var rewriting works without much overhead. So should be ok to
continue supporting that. Just that as part of official documentation steps
around any configuration we provide examples only for typesafe config way.

Chetan Mehrotra
[2] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdcosjira.atlassian.net%2Fbrowse%2FDCOS-516data=02%7C01%7Ctnorris%40adobe.com%7C4271606ed8eb408156d008d608d99e3b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636706128094622033sdata=E%2BZa%2BCSP5gYVO05pVFIlaAgYwD9AFJ2cR7twRkGnWkw%3Dreserved=0


On Thu, Aug 23, 2018 at 2:12 PM Markus Thömmes 
wrote:

> Hi Chetan,
>
> Am Do., 23. Aug. 2018 um 10:28 Uhr schrieb Chetan Mehrotra <
> chetan.mehro...@gmail.com>:
>
> > > Is it possible to layer configurations with this? For example: If I
> > create
> > a `database.conf` and a `controller.conf`, is there a way to mount these
> in
> > a way that they are both read and merged by the specific component?
> >
> > This should be possible as of now also. If file mount option is used [1]
> > then you need to
> >
> > 1. Mount configMap directory  `/controller/config`
> > 2. Have one `application.conf` in that directory which has includes for
> > `database.conf` and `controller.conf`
> >
>
> Ahhh thanks, I completely forgot about includes!
>
>
> >
> > Then with changes done for #3059 these files would be included in
> classpath
> > and typesafe config logic to read all "application.conf" would get them
> > included.
> >
> > Only issue is order is non deterministic so config override would not
> work.
> > That can be enabled by setting system property
> > -Dconfig.file=/controller/config/application.conf
> > (or env CONFIG_config_file). This would ensure that any later includes
> > would supersede previous includes
> >
>
> Sounds good.
>
> How will this impact other deployment tools, like Docker Compose? I'm 
aware
> that your change keeps the old path working, but do we envision to drop
> that at some point?
>
> Would it be valuable to have a writeup in the Wiki outlining how we'd
> envision a future configuration method to work across all the deployment
> methods that we have? I feel like there's lots of simplification to get
> especially for cases like Mesos and/or Kubernetes, if we drop the env-var
> rewriting.
>
>
> >
> >
> > Chetan Mehrotra
> > [1]
> >
> >
> 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Ftasks%2Fconfigure-pod-container%2Fconfigure-pod-configmap%2F%23populate-a-volume-with-data-stored-in-a-configmapdata=02%7C01%7Ctnorris%40adobe.com%7C4271606ed8eb408156d008d608d99e3b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636706128094622033sdata=DK9oQ%2F%2Fi6PDi0XU%2BD7WaDFeaZwlXAMG%2BXdC3Xwjzpyc%3Dreserved=0
> >
> > On Thu, Aug 23, 2018 at 1:32 PM Markus Thömmes <
> markusthoem...@apache.org>
> > wrote:
> >
> > > Hi Chetan,
> > >
> > > good idea!
> > >
> > > A bit of background on why it is how it is: When I implemented the
> > approach
> > > we're having today, the basic thought was to be able to 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Hi - thanks for the discussion! More inline...

On 8/22/18, 2:55 PM, "Markus Thömmes"  wrote:

Hi Tyson,

Am Mi., 22. Aug. 2018 um 23:37 Uhr schrieb Tyson Norris
:

> Hi -
> >
> > When exactly is the case that a ContainerRouter should put a 
blocking
> > activation to a queue for stealing? Since a) it is not spawning
> containers
> > and b) it is not parsing request/response bodies, can we say this
> would
> > only happen when a ContainerRouter maxes out its incoming request
> handling?
> >
>
> That's exactly the idea! The work-stealing queue will only be used if
> the
> Router where to request landed cannot serve the demand right now. For
> example, if it maxed out the slots it has for a certain action (all
> containers are working to their full extent) it requests more
> resources and
> puts the request-token on the work-stealing queue.
>
> So to clarify, ContainerRouter "load" (which can trigger use of queue) is
> mostly (only?) based on:
> * the number of Container references
> * the number of outstanding inbound  HTTP requests, e.g. when lots of
> requests can be routed to the same container
> * the number of outstand outbound HTTP requests to remote action
> containers (assume all are remote)
> It is unclear the order of magnitude considered for "maxed out slots",
> since container refs should be simple (like ip+port, action metadata,
> activation count, and warm state), inbound connection handling is 
basically
> a http server, and outbound is a connection pool per action container
> (let's presume connection reuse for the moment).
> I think it will certainly need testing to determine these and to be
> configurable in any case, for each of these separate stats.. Is there
> anything else that affects the load for ContainerRouter?
>

"Overload" is determined by the availability of free slots on any container
being able to serve the current action invocation (or rather the absence
thereof). An example:
Say RouterA has 2 containers for action X. Each container has an allowed
concurrency of 10. On each of those 2 there are 10 active invocations
already running (the ContainerRouter knows this, these are open connections
to the containers). If another request comes in for X, we know we don't
have capacity for it. We request more resources and offer the work we got
for stealing.

I don't think there are tweaks needed here. The Router keeps an
"activeInvocations" number per container and compares that to the allowed
concurrency on that container. If activeInvocations == allowedConcurrency
we're out of capacity and need more.

We need a work-stealing queue here to dynamically rebalance between the
Routers since the layer above the Routers has no idea about capacity and
(at least that's my assumption) schedules randomly.

I think it is confusing to say that the ContainerRouter doesn't have capacity 
for it - rather, the existing set of continers in the ContainerRouter don't 
have capacity for it. I understand now, in any case.
So there are a couple of active paths in ContainerRouter, still only 
considering sync/blocking activations:
* warmpath - run immediately
* coldpath - send to queue

And each ContainerRouter has a queue consumer that presumably pulls from the 
queue constantly? Or is consumption based on something else? If all 
ContainerRouters are consuming at the same rate, then while this does 
distribute the load across ContainerRouters, it doesn't really guarantee any 
similar state (number of containers, active connections, etc) at each 
ContainerRouter, I think. Maybe I am missing something here?
  




>
> That request-token will then be taken by any Router that has free
> capacity
> for that action (note: this is not simple with kafka, but might be
> simpler
> with other MQ technologies). Since new resources have been requested,
> it is
> guaranteed that one Router will eventually become free.
>
> Is "requests resources" here requesting new action containers, which it
> won't be able to process itself immediately, but should startup + warm and
> be provided to "any ContainerRouter"? This makes, sense, just want to
> clarify that "resources == containers".
>

Yes, resources == containers.


>
> >
> > If ContainerManager has enough awareness of ContainerRouters'
> states, I'm
> > not sure where using a queue would b

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Hi - 
>
> When exactly is the case that a ContainerRouter should put a blocking
> activation to a queue for stealing? Since a) it is not spawning containers
> and b) it is not parsing request/response bodies, can we say this would
> only happen when a ContainerRouter maxes out its incoming request 
handling?
>

That's exactly the idea! The work-stealing queue will only be used if the
Router where to request landed cannot serve the demand right now. For
example, if it maxed out the slots it has for a certain action (all
containers are working to their full extent) it requests more resources and
puts the request-token on the work-stealing queue.

So to clarify, ContainerRouter "load" (which can trigger use of queue) is 
mostly (only?) based on:
* the number of Container references 
* the number of outstanding inbound  HTTP requests, e.g. when lots of requests 
can be routed to the same container
* the number of outstand outbound HTTP requests to remote action containers 
(assume all are remote)
It is unclear the order of magnitude considered for "maxed out slots", since 
container refs should be simple (like ip+port, action metadata, activation 
count, and warm state), inbound connection handling is basically a http server, 
and outbound is a connection pool per action container (let's presume 
connection reuse for the moment).
I think it will certainly need testing to determine these and to be 
configurable in any case, for each of these separate stats.. Is there anything 
else that affects the load for ContainerRouter?

That request-token will then be taken by any Router that has free capacity
for that action (note: this is not simple with kafka, but might be simpler
with other MQ technologies). Since new resources have been requested, it is
guaranteed that one Router will eventually become free.

Is "requests resources" here requesting new action containers, which it won't 
be able to process itself immediately, but should startup + warm and be 
provided to "any ContainerRouter"? This makes, sense, just want to clarify that 
"resources == containers".

>
> If ContainerManager has enough awareness of ContainerRouters' states, I'm
> not sure where using a queue would be used (for redirecting to other
> ContainerRouters) vs ContainerManager responding with a ContainerRouters
> reference (instead of an action container reference) - I'm not following
> the logic of the edge case in the proposal - there is mention of "which
> controller the request needs to go", but maybe this is a typo and should
> say ContainerRouter?
>

Indeed that's a typo, it should say ContainerRouter.

The ContainerManager only knows which Router has which Container. It does
not know whether the respective Router has capacity on that container (the
capacity metric is very hard to share since it's ever changing).

Hence, in an edge-case where there are less Containers than Routers, the
ContainerManager can hand out references to the Routers it gave Containers
to the Routers that have none. (This is the edge-case described in the
proposal).

I'm not sure why in this case the ContainerManager does not just create a new 
container, instead of sending to another Router? If there is some intended 
limit on "number of containers for a particular action", that would be a 
reason, but given that the ContainerManager cannot know the state of the 
existing containers, it seems like sending to another Router which has the 
container, but may not be able to use it immediately, may cause failures in 
some cases. 


The work-stealing queue though is used to rebalance work in case one of the
Routers get overloaded.

Got it.

Thanks
Tyson
 



Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Yes, agreed this makes sense, same as Carlos is saying. 

Let's ignore async for now, I think that one is simpler __ - does "A blocking 
request can still be put onto the work-stealing queue" mean that it wouldn't 
always be put on the queue? 

If there is existing warm container capacity in the ContainerRouter receiving 
the activation, ideally it would skip the queue - right? 

When exactly is the case that a ContainerRouter should put a blocking 
activation to a queue for stealing? Since a) it is not spawning containers and 
b) it is not parsing request/response bodies, can we say this would only happen 
when a ContainerRouter maxes out its incoming request handling? 

If ContainerManager has enough awareness of ContainerRouters' states, I'm not 
sure where using a queue would be used (for redirecting to other 
ContainerRouters) vs ContainerManager responding with a ContainerRouters 
reference (instead of an action container reference) - I'm not following the 
logic of the edge case in the proposal - there is mention of "which controller 
the request needs to go", but maybe this is a typo and should say 
ContainerRouter?

Thanks
Tyson

On 8/21/18, 1:16 AM, "Markus Thömmes"  wrote:

Hi Tyson,

if we take the concerns apart as I proposed above, timeouts should only
ever be triggered after a request is scheduled as you say, that is: As soon
as it's crossing the user-container mark. With the concern separation, it
is plausible that blocking invocations are never buffered anywhere, which
makes a lot of sense, because you cannot persist the open HTTP connection
to the client anyway.

To make the distinction clear: A blocking request can still be put onto the
work-stealing queue to be balanced between different ContainerRouters.

A blocking request though would never be written to a persistent buffer
that's used to be able to efficiently handle async invocations and
backpressuring them. That buffer should be entirely separate and could
possibly be placed outside of the execution system to make the distinction
more explicit. The execution system itself would then only deal with
request-response style invocations and asynchronous invocations are done by
having a seperate queue and a consumer that creates HTTP requests to the
execution system.

Cheers,
Markus

Am Mo., 20. Aug. 2018 um 23:30 Uhr schrieb Tyson Norris
:

> Thanks for summarizing Markus.
>
> Yes this is confusing in context of current system, which stores in kafka,
> but not to indefinitely wait, since timeout begins immediately
> So, I think the problem of buffering/queueing is: when does the timeout
> begin? If not everything is buffered the same, their timeout should not
> begin until processing begins.
>
> Maybe it would make sense to:
> * always buffer (indefinitely) to queue for async, never for sync
> * timeout for async not started till read from queue - which may be
> delayed from time of trigger or http request
> * this should also come with some system monitoring to indicate the queue
> processing is not keeping up with some configurable max delay threshold 
("I
> can’t tolerate delays of > 5 minutes", etc)
> * ContainerRouters can only pull from async queue when
> * increasing the number of pending activations won’t exceed some
> threshold (prevent excessive load of async on ContainerRouters)
> * ContainerManager is not overloaded (can still create containers,
> or has some configurable way to indicate the cluster is healthy enough to
> cope with extra processing)
>
> We could of course make this configurable so that operators can choose to:
> * treat async/sync activations the same for sync/async (the overloaded
> system fails when either ContainerManager or ContainerRouters are max
> capacity)
> * treat async/sync with preference for:
> * sync - where async is buffered for unknown period before
> processing, incoming sync traffic (or lack of)
> * async - where sync is sent to the queue, to be processed in
> order of receipt interleaved with async traffic (similar to today, I 
think)
>
> I think the impact here (aside from technical) is the timing difference if
> we introduce latency in side affects based on the activation being sync vs
> async.
>
> I’m also not sure prioritizing message processing between sync/async
> internally in ContainerRouter is better than just have some dedicated
> ContainerRouters that receive all async activations, and others that
> receive all sync activations, but the end result is the same, I think.
>
>
> > On Aug 19, 201

Re: Proposal on a future architecture of OpenWhisk

2018-08-21 Thread Tyson Norris
> Tracking these metrics consistently will introduce the same problem as
> precisely tracking throttling numbers across multiple controllers, I 
think,
> where either there is delay introduced to use remote data, or eventual
> consistency will introduce inaccurate data.
>

If you're talking about limit enforcement, you're right! Regarding the
concurrency on each container though, we are able to accurately track that
and we need to be able to make sure that actual concurrency is always <= C.


>
> I’m interested to know if this accuracy is important as long as actual
> concurrency <= C?
>

I don't think it is as much, no. But how do you keep <= C if you don't
accurately track?

Maybe I should say that while we cannot accurately track, we can still 
guarantee <= C, we just cannot guarantee maximizing concurrency up to C.

Since the HTTP requests are done via futures in proxy, the messaging between 
pool and proxy doesn't have an accurate way to get exactly C requests in 
flight, but can prevent ever sending > C messages that cause the HTTP requests. 
The options for this are:
- track in flight requests in the pool; passing C will cause more containers to 
be used, but probably the container will always only have < C in flight.
- track in flight requests in the proxy; passing C will cause the message in 
proxy to be stashed/delayed until some HTTP requests are completed, and if the 
>C state remains, the pool will eventually learn this state and cause more 
containers to be used.

(current impl in the PR does the latter) 

Thanks
Tyson
 



Re: Proposal on a future architecture of OpenWhisk

2018-08-20 Thread Tyson Norris


On Aug 19, 2018, at 3:59 AM, Markus Thömmes 
mailto:markusthoem...@apache.org>> wrote:

Hi Tyson,

Am Fr., 17. Aug. 2018 um 23:45 Uhr schrieb Tyson Norris
mailto:tnor...@adobe.com.invalid>>:


If the failover of the singleton is too long (I think it will be based on
cluster size, oldest node becomes the singleton host iirc), I think we need
to consider how containers can launch in the meantime. A first step might
be to test out the singleton behavior in the cluster of various sizes.


I agree this bit of design is crucial, a few thoughts:
Pre-warm wouldn't help here, the ContainerRouters only know warm
containers. Pre-warming is managed by the ContainerManager.

Ah right


Considering a fail-over scenario: We could consider sharing the state via
EventSourcing. That is: All state lives inside of frequently snapshotted
events and thus can be shared between multiple instances of the
ContainerManager seamlessly. Alternatively, we could also think about only
working on persisted state. That way, a cold-standby model could fly. We
should make sure that the state is not "slightly stale" but rather both
instances see the same state at any point in time. I believe on that
cold-path of generating new containers, we can live with the extra-latency
of persisting what we're doing as the path will still be dominated by the
container creation latency.

Wasn’t clear if you mean not using ClusterSingleton? To be clear in
ClusterSingleton case there are 2 issues:
- time it takes for akka ClusterSingletonManager to realize it needs to
start a new actor
- time it takes for the new actor to assume a usable state

EventSourcing (or ext persistence) may help with the latter, but we will
need to be sure the former is tolerable to start with.
Here is an example test from akka source that may be useful (multi-jvm,
but all local):

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka%2Fblob%2F009214ae07708e8144a279e71d06c4a504907e31%2Fakka-cluster-tools%2Fsrc%2Fmulti-jvm%2Fscala%2Fakka%2Fcluster%2Fsingleton%2FClusterSingletonManagerChaosSpec.scaladata=02%7C01%7Ctnorris%40adobe.com%7C63c6bb3a36724f38cc9d08d605c2ddee%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636702732034251656sdata=omVsIo%2FoD8weG4Zy%2BGX2A53ATRmylUxYCbqknu4MoeM%3Dreserved=0

Some things to consider, that I don’t know details of:
- will the size of cluster affect the singleton behavior in case of
failure? (I think so, but not sure, and what extent); in the simple test
above it takes ~6s for the replacement singleton to begin startup, but if
we have 100s of nodes, I’m not sure how much time it will take. (I don’t
think this should be hard to test, but I haven’t done it)
- in case of hard crash, what is the singleton behavior? In graceful jvm
termination, I know the cluster behavior is good, but there is always this
question about how downing nodes will be handled. If this critical piece of
the system relies on akka cluster functionality, we will need to make sure
that the singleton can be reconstituted, both in case of graceful
termination (restart/deployment events) and non-graceful termination (hard
vm crash, hard container crash) . This is ignoring more complicated cases
of extended network partitions, which will also have bad affects on many of
the downstream systems.


I don't think we need to be eager to consider akka-cluster to be set in
stone here. The singleton in my mind doesn't need to be clustered at all.
Say we have a fully shared state through persistence or event-sourcing and
a hot-standby model, couldn't we implement the fallback through routing in
front of the active/passive ContainerManager pair? Once one goes
unreachable, fall back to the other.



Yeah I would rather see the hot standby and deal with persistence. I don’t 
think akka clustersingleton is going to be fast enough in a high volume 
scenario.
Either routing in front or ContainerRouters who observe the active (leader) 
status, we just have to determine that the status change is tolerably fast.





Handover time as you say is crucial, but I'd say as it only impacts
container creation, we could live with, let's say, 5 seconds of
failover-downtime on this path? What's your experience been on singleton
failover? How long did it take?


Seconds in the simplest case, so I think we need to test it in a scaled
case (100s of cluster nodes), as well as the hard crash case (where not
downing the node may affect the cluster state).




On Aug 16, 2018, at 11:01 AM, Tyson Norris 
mailto:tnor...@adobe.com.INVALID>
<mailto:tnor...@adobe.com.INVALID>>
wrote:

A couple comments on singleton:
- use of cluster singleton will introduce a new single point of failure
- from time of singleton node failure, to single resurrection on a
different instance, will be an outage from the point of view of any
ContainerRouter that does not already have a warm+free container to service
an activation
- resurrecting the singleton will require tr

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-20 Thread Tyson Norris
2018 um 02:36 Uhr schrieb Carlos Santana <
> csantan...@gmail.com>:
> 
>> triggers get responded right away (202) with an activation is and then
>> sent to the queue to be processed async same as async action invokes.
>> 
>> I think we would keep same contract as today for this type of activations
>> that are eventually process different from blocking invokes including we
>> Actions were the http client hold a connection waiting for the result back.
>> 
>> - Carlos Santana
>> @csantanapr
>> 
>>> On Aug 17, 2018, at 6:14 PM, Tyson Norris 
>> wrote:
>>> 
>>> Hi -
>>> Separate thread regarding the proposal: what is considered for routing
>> activations as overload and destined for kafka?
>>> 
>>> In general, if kafka is not on the blocking activation path, why would
>> it be used at all, if the timeouts and processing expectations of blocking
>> and non-blocking are the same?
>>> 
>>> One case I can imagine: triggers + non-blocking invokes, but only in the
>> case where those have some different timeout characteristics. e.g. if a
>> trigger fires an action, is there any case where the activation should be
>> buffered to kafka if it will timeout same as a blocking activation?
>>> 
>>> Sorry if I’m missing something obvious.
>>> 
>>> Thanks
>>> Tyson
>>> 
>>> 
>> 



Re: Concurrency PR

2018-08-20 Thread Tyson Norris

To better handle the case of images that don’t support concurrency, or
don’t support log collection from invoker, I would suggest we change the
container protocol to allow containers to broadcast their support either
via the /init endpoint, or via a new /info endpoint. This of course would
not give feedback until an action is executed (as opposed to when action is
created), but I think this is ok. I will work on a separate PR for this,
but want to mention some thoughts here about possible approaches to address
these known concerns.


Why not make this part of the runtimes manifest? Handling this as late as
actually invoking the action feels kinda weird if we can just as well know
ahead of time, that creating an action with a concurrency > 1 will not work
and should therefore forbid creation at all. Any strong reason not to
encode that information into the runtimes manifest?


One thing is blackbox containers - will only managed containers be allowed to 
support concurrency? If not, how will blackbox containers advertise support?

Separately, the only thing that action creation currently does is validate 
“action build time” issues, like "do action limits fit within the allowed 
ranges”, it does not mean the input value is appropriate for execution. If the 
runtime manifest had per-image settings for all action limits, it would make 
more sense to include concurrency there - but I would have the same suggestion, 
that allowing the image to dictate the entirety of its opinions on these 
allowed ranges would be nicer to do as part of the container protocol (/init, 
/run), and not the manifest - partly for blackbox support, and partly because 
it is simpler to maintain just the image and just the manifest (which just 
references images, not the metadata of what the image supports internally).

In any case, is this (per container metadata indicating concurrency support) 
really required for supporting concurrency at all? I agree it will be nice, but 
don’t think that it should block adding support for concurrency.

WDYT?

Thanks
Tyson




Kafka and Proposal on a future architecture of OpenWhisk

2018-08-17 Thread Tyson Norris
Hi - 
Separate thread regarding the proposal: what is considered for routing 
activations as overload and destined for kafka?

In general, if kafka is not on the blocking activation path, why would it be 
used at all, if the timeouts and processing expectations of blocking and 
non-blocking are the same?

One case I can imagine: triggers + non-blocking invokes, but only in the case 
where those have some different timeout characteristics. e.g. if a trigger 
fires an action, is there any case where the activation should be buffered to 
kafka if it will timeout same as a blocking activation?  

Sorry if I’m missing something obvious.

Thanks
Tyson




Re: Proposal on a future architecture of OpenWhisk

2018-08-17 Thread Tyson Norris
Ugh my reply formatting got removed!!! Trying this again with some >>

On Aug 17, 2018, at 2:45 PM, Tyson Norris 
mailto:tnor...@adobe.com.INVALID>> wrote:


If the failover of the singleton is too long (I think it will be based on
cluster size, oldest node becomes the singleton host iirc), I think we need
to consider how containers can launch in the meantime. A first step might
be to test out the singleton behavior in the cluster of various sizes.


I agree this bit of design is crucial, a few thoughts:
Pre-warm wouldn't help here, the ContainerRouters only know warm
containers. Pre-warming is managed by the ContainerManager.


>> Ah right



Considering a fail-over scenario: We could consider sharing the state via
EventSourcing. That is: All state lives inside of frequently snapshotted
events and thus can be shared between multiple instances of the
ContainerManager seamlessly. Alternatively, we could also think about only
working on persisted state. That way, a cold-standby model could fly. We
should make sure that the state is not "slightly stale" but rather both
instances see the same state at any point in time. I believe on that
cold-path of generating new containers, we can live with the extra-latency
of persisting what we're doing as the path will still be dominated by the
container creation latency.



>> Wasn’t clear if you mean not using ClusterSingleton? To be clear in 
>> ClusterSingleton case there are 2 issues:
- time it takes for akka ClusterSingletonManager to realize it needs to start a 
new actor
- time it takes for the new actor to assume a usable state

EventSourcing (or ext persistence) may help with the latter, but we will need 
to be sure the former is tolerable to start with.
Here is an example test from akka source that may be useful (multi-jvm, but all 
local):
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka%2Fblob%2F009214ae07708e8144a279e71d06c4a504907e31%2Fakka-cluster-tools%2Fsrc%2Fmulti-jvm%2Fscala%2Fakka%2Fcluster%2Fsingleton%2FClusterSingletonManagerChaosSpec.scaladata=02%7C01%7Ctnorris%40adobe.com%7C50be947ede884f3b78e208d6048ac99a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636701391474213555sdata=Ojk1yRGCbG4OxD5MXOabmH1ggbgk%2BymZ7%2BUqDQINAPo%3Dreserved=0

Some things to consider, that I don’t know details of:
- will the size of cluster affect the singleton behavior in case of failure? (I 
think so, but not sure, and what extent); in the simple test above it takes ~6s 
for the replacement singleton to begin startup, but if we have 100s of nodes, 
I’m not sure how much time it will take. (I don’t think this should be hard to 
test, but I haven’t done it)
- in case of hard crash, what is the singleton behavior? In graceful jvm 
termination, I know the cluster behavior is good, but there is always this 
question about how downing nodes will be handled. If this critical piece of the 
system relies on akka cluster functionality, we will need to make sure that the 
singleton can be reconstituted, both in case of graceful termination 
(restart/deployment events) and non-graceful termination (hard vm crash, hard 
container crash) . This is ignoring more complicated cases of extended network 
partitions, which will also have bad affects on many of the downstream systems.




Handover time as you say is crucial, but I'd say as it only impacts
container creation, we could live with, let's say, 5 seconds of
failover-downtime on this path? What's your experience been on singleton
failover? How long did it take?



>> Seconds in the simplest case, so I think we need to test it in a scaled case 
>> (100s of cluster nodes), as well as the hard crash case (where not downing 
>> the node may affect the cluster state).





On Aug 16, 2018, at 11:01 AM, Tyson Norris 
mailto:tnor...@adobe.com.INVALID><mailto:tnor...@adobe.com.INVALID>>
wrote:

A couple comments on singleton:
- use of cluster singleton will introduce a new single point of failure
- from time of singleton node failure, to single resurrection on a
different instance, will be an outage from the point of view of any
ContainerRouter that does not already have a warm+free container to service
an activation
- resurrecting the singleton will require transferring or rebuilding the
state when recovery occurs - in my experience this was tricky, and requires
replicating the data (which will be slightly stale, but better than
rebuilding from nothing); I don’t recall the handover delay (to transfer
singleton to a new akka cluster node) when I tried last, but I think it was
not as fast as I hoped it would be.

I don’t have a great suggestion for the singleton failure case, but
would like to consider this carefully, and discuss the ramifications (which
may or may not be tolerable) before pursuing this particular aspect of the
design.


On prioritization:
- if concurrency is enabled for an action, this is anothe

Re: Proposal on a future architecture of OpenWhisk

2018-08-17 Thread Tyson Norris

If the failover of the singleton is too long (I think it will be based on
cluster size, oldest node becomes the singleton host iirc), I think we need
to consider how containers can launch in the meantime. A first step might
be to test out the singleton behavior in the cluster of various sizes.


I agree this bit of design is crucial, a few thoughts:
Pre-warm wouldn't help here, the ContainerRouters only know warm
containers. Pre-warming is managed by the ContainerManager.

Ah right


Considering a fail-over scenario: We could consider sharing the state via
EventSourcing. That is: All state lives inside of frequently snapshotted
events and thus can be shared between multiple instances of the
ContainerManager seamlessly. Alternatively, we could also think about only
working on persisted state. That way, a cold-standby model could fly. We
should make sure that the state is not "slightly stale" but rather both
instances see the same state at any point in time. I believe on that
cold-path of generating new containers, we can live with the extra-latency
of persisting what we're doing as the path will still be dominated by the
container creation latency.

Wasn’t clear if you mean not using ClusterSingleton? To be clear in 
ClusterSingleton case there are 2 issues:
- time it takes for akka ClusterSingletonManager to realize it needs to start a 
new actor
- time it takes for the new actor to assume a usable state

EventSourcing (or ext persistence) may help with the latter, but we will need 
to be sure the former is tolerable to start with.
Here is an example test from akka source that may be useful (multi-jvm, but all 
local):
https://github.com/akka/akka/blob/009214ae07708e8144a279e71d06c4a504907e31/akka-cluster-tools/src/multi-jvm/scala/akka/cluster/singleton/ClusterSingletonManagerChaosSpec.scala

Some things to consider, that I don’t know details of:
- will the size of cluster affect the singleton behavior in case of failure? (I 
think so, but not sure, and what extent); in the simple test above it takes ~6s 
for the replacement singleton to begin startup, but if we have 100s of nodes, 
I’m not sure how much time it will take. (I don’t think this should be hard to 
test, but I haven’t done it)
- in case of hard crash, what is the singleton behavior? In graceful jvm 
termination, I know the cluster behavior is good, but there is always this 
question about how downing nodes will be handled. If this critical piece of the 
system relies on akka cluster functionality, we will need to make sure that the 
singleton can be reconstituted, both in case of graceful termination 
(restart/deployment events) and non-graceful termination (hard vm crash, hard 
container crash) . This is ignoring more complicated cases of extended network 
partitions, which will also have bad affects on many of the downstream systems.



Handover time as you say is crucial, but I'd say as it only impacts
container creation, we could live with, let's say, 5 seconds of
failover-downtime on this path? What's your experience been on singleton
failover? How long did it take?


Seconds in the simplest case, so I think we need to test it in a scaled case 
(100s of cluster nodes), as well as the hard crash case (where not downing the 
node may affect the cluster state).




On Aug 16, 2018, at 11:01 AM, Tyson Norris 
mailto:tnor...@adobe.com.INVALID>>
wrote:

A couple comments on singleton:
- use of cluster singleton will introduce a new single point of failure
- from time of singleton node failure, to single resurrection on a
different instance, will be an outage from the point of view of any
ContainerRouter that does not already have a warm+free container to service
an activation
- resurrecting the singleton will require transferring or rebuilding the
state when recovery occurs - in my experience this was tricky, and requires
replicating the data (which will be slightly stale, but better than
rebuilding from nothing); I don’t recall the handover delay (to transfer
singleton to a new akka cluster node) when I tried last, but I think it was
not as fast as I hoped it would be.

I don’t have a great suggestion for the singleton failure case, but
would like to consider this carefully, and discuss the ramifications (which
may or may not be tolerable) before pursuing this particular aspect of the
design.


On prioritization:
- if concurrency is enabled for an action, this is another
prioritization aspect, of sorts - if the action supports concurrency, there
is no reason (except for destruction coordination…) that it cannot be
shared across shards. This could be added later, but may be worth
considering since there is a general reuse problem where a series of
activations that arrives at different ContainerRouters will create a new
container in each, while they could be reused (and avoid creating new
containers) if concurrency is tolerated in that container. This would only
(ha ha) require changing how container destroy works, where it can

Concurrency PR

2018-08-17 Thread Tyson Norris
Hi -
I have been noodling with a few tests and the akka http client and gotten the 
concurrency PR [1] to a good place, I think, so if anyone can help review that 
would be appreciated.

A couple of notes:
- akka http client has some different notion of connection reuse than the 
apache client, to address this I created a separate PR [2] which, instead of 
dissuading connection reuse, simple destroys the client (and connection pool) 
when the container is paused. (This change is not reflected in 2795 FWIW). 
AFAIK the connection reuse issue only comes up with container pauses, so I 
wanted to address this where it is relevant, and not impose additional 
performance costs for concurrency cases. This client is still not enabled by 
default.
- There was mention in the comments (for 2795) about need to handle a case 
where a container doesn’t support concurrency, but the action dev has enabled 
it at the action - this PR does NOT deal with that.

To summarize, enabling concurrency requires:
- all actions may signal that they support concurrency, so all images that 
might be used would need to support concurrency, if concurrency is enabled in 
your deployment
- log collection must be handled outside of invoker (since invoker does not 
deal with interleaved log parsing)
- wsk cli will require changes to allow action devs to set the concurrency 
limits on actions (current PR only exposes the OW api for doing this); I have a 
PR queued up for that [3]. (Will need another PR for the cli once the client-go 
lib is updated)

To better handle the case of images that don’t support concurrency, or don’t 
support log collection from invoker, I would suggest we change the container 
protocol to allow containers to broadcast their support either via the /init 
endpoint, or via a new /info endpoint. This of course would not give feedback 
until an action is executed (as opposed to when action is created), but I think 
this is ok. I will work on a separate PR for this, but want to mention some 
thoughts here about possible approaches to address these known concerns.

Thanks
Tyson


[1] https://github.com/apache/incubator-openwhisk/pull/2795
[2] https://github.com/apache/incubator-openwhisk/pull/3976
[3] https://github.com/apache/incubator-openwhisk-client-go/pull/94



Re: Proposal on a future architecture of OpenWhisk

2018-08-16 Thread Tyson Norris
Thinking more about the singleton aspect, I guess this is mostly an issue for 
blackbox containers, where manifest/managed containers will mitigate at least 
some of the singleton failure delays by prewarm/stemcell containers. 

So in the case of singleton failure, impacts would be:
- managed containers once prewarms are exhausted (may be improved by being more 
intelligent about prewarm pool sizing based on load etc)
- managed containers that don’t match any prewarms (similar - if prewarm pool 
is dynamically configured based on load, this is less problem)
- blackbox containers (no help)

If the failover of the singleton is too long (I think it will be based on 
cluster size, oldest node becomes the singleton host iirc), I think we need to 
consider how containers can launch in the meantime. A first step might be to 
test out the singleton behavior in the cluster of various sizes.

> On Aug 16, 2018, at 11:01 AM, Tyson Norris  wrote:
> 
> A couple comments on singleton:
> - use of cluster singleton will introduce a new single point of failure - 
> from time of singleton node failure, to single resurrection on a different 
> instance, will be an outage from the point of view of any ContainerRouter 
> that does not already have a warm+free container to service an activation
> - resurrecting the singleton will require transferring or rebuilding the 
> state when recovery occurs - in my experience this was tricky, and requires 
> replicating the data (which will be slightly stale, but better than 
> rebuilding from nothing); I don’t recall the handover delay (to transfer 
> singleton to a new akka cluster node) when I tried last, but I think it was 
> not as fast as I hoped it would be.
> 
> I don’t have a great suggestion for the singleton failure case, but would 
> like to consider this carefully, and discuss the ramifications (which may or 
> may not be tolerable) before pursuing this particular aspect of the design.
> 
> 
> On prioritization:
> - if concurrency is enabled for an action, this is another prioritization 
> aspect, of sorts - if the action supports concurrency, there is no reason 
> (except for destruction coordination…) that it cannot be shared across 
> shards. This could be added later, but may be worth considering since there 
> is a general reuse problem where a series of activations that arrives at 
> different ContainerRouters will create a new container in each, while they 
> could be reused (and avoid creating new containers) if concurrency is 
> tolerated in that container. This would only (ha ha) require changing how 
> container destroy works, where it cannot be destroyed until the last 
> ContainerRouter is done with it. And if container destruction is coordinated 
> in this way to increase reuse, it would also be good to coordinate 
> construction (don’t concurrently construct the same container for multiple 
> containerRouters IFF a single container would enable concurrent activations 
> once it is created). I’m not sure if others are desiring this level of 
> container reuse, but if so, it would be worth considering these aspects 
> (sharding/isolation vs sharing/coordination) as part of any redesign.
> 
> 
> WDYT?
> 
> THanks
> Tyson
> 
> On Aug 15, 2018, at 8:55 AM, Carlos Santana 
> mailto:csantan...@gmail.com>> wrote:
> 
> I think we should add a section on prioritization for blocking vs. async
> invokes (none blocking actions a triggers)
> 
> The front door has the luxury of known some intent from the incoming
> request, I feel it would make sense to high priority to blocking invokes,
> and for async they go straight to the queue to be pick up by the system to
> eventually run, even if it takes 10 times longer to execute than a blocking
> invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
> async webhook takes 100ms.
> 
> Also the controller takes time to convert a trigger and process the rules,
> this is something that can also be taken out of hot path.
> 
> So I'm just saying we could optimize the system because we know if the
> incoming request is a hot or hotter path :-)
> 
> -- Carlos
> 
> 



Re: Proposal on a future architecture of OpenWhisk

2018-08-16 Thread Tyson Norris
A couple comments on singleton:
- use of cluster singleton will introduce a new single point of failure - from 
time of singleton node failure, to single resurrection on a different instance, 
will be an outage from the point of view of any ContainerRouter that does not 
already have a warm+free container to service an activation
- resurrecting the singleton will require transferring or rebuilding the state 
when recovery occurs - in my experience this was tricky, and requires 
replicating the data (which will be slightly stale, but better than rebuilding 
from nothing); I don’t recall the handover delay (to transfer singleton to a 
new akka cluster node) when I tried last, but I think it was not as fast as I 
hoped it would be.

I don’t have a great suggestion for the singleton failure case, but would like 
to consider this carefully, and discuss the ramifications (which may or may not 
be tolerable) before pursuing this particular aspect of the design.


On prioritization:
- if concurrency is enabled for an action, this is another prioritization 
aspect, of sorts - if the action supports concurrency, there is no reason 
(except for destruction coordination…) that it cannot be shared across shards. 
This could be added later, but may be worth considering since there is a 
general reuse problem where a series of activations that arrives at different 
ContainerRouters will create a new container in each, while they could be 
reused (and avoid creating new containers) if concurrency is tolerated in that 
container. This would only (ha ha) require changing how container destroy 
works, where it cannot be destroyed until the last ContainerRouter is done with 
it. And if container destruction is coordinated in this way to increase reuse, 
it would also be good to coordinate construction (don’t concurrently construct 
the same container for multiple containerRouters IFF a single container would 
enable concurrent activations once it is created). I’m not sure if others are 
desiring this level of container reuse, but if so, it would be worth 
considering these aspects (sharding/isolation vs sharing/coordination) as part 
of any redesign.


WDYT?

THanks
Tyson

On Aug 15, 2018, at 8:55 AM, Carlos Santana 
mailto:csantan...@gmail.com>> wrote:

I think we should add a section on prioritization for blocking vs. async
invokes (none blocking actions a triggers)

The front door has the luxury of known some intent from the incoming
request, I feel it would make sense to high priority to blocking invokes,
and for async they go straight to the queue to be pick up by the system to
eventually run, even if it takes 10 times longer to execute than a blocking
invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
async webhook takes 100ms.

Also the controller takes time to convert a trigger and process the rules,
this is something that can also be taken out of hot path.

So I'm just saying we could optimize the system because we know if the
incoming request is a hot or hotter path :-)

-- Carlos




Re: logging baby step -- worth pursuing?

2018-08-15 Thread Tyson Norris
Hi - 
FWIW This won’t help with concurrent activations since the logs from concurrent 
activations will be interleaved (I think Dave was not suggesting to use this 
for concurrent activations). It will only help in the case where log processing 
is done outside of the invoker, and logs are not interleaved from multiple 
activations. 
I’m not sure having a start sentinel is simpler than just including the 
activation id in the existing sentinel line (end of log segment, not the 
beginning), but it would be probably simpler to read for a human.

If people use blackbox actions, and if blackbox containers have different log 
collection than managed actions, I think that would be a reason to not do 
anything until there is better support for structured logging, since if you are 
still using invoker to collect blackbox logs, you might as well use it to 
collect all logs? It may be that majority log collection is not blackbox so you 
could get some efficiencies there, but the added mess of multiple log 
collection approaches may bring different problems (my logs behave different 
for different types of actions, etc).

One option might be to allow the /init endpoint to return some details about 
the container image, so that it can hint how it expects logs to be handled (if 
at all) at the invoker - currently /init response is only interpreted in case 
of a non-200 response. This same approach may be useful for other optional 
facilities like support of concurrency or gpu, where the container can signal 
it’s support and fail early if there is a mismatch with the action being 
executed. This would not resolve the different behavior problem, but would 
provide a smooth transition for older blackbox images.

Thanks
Tyson

> On Aug 14, 2018, at 2:49 PM, Dragos Dascalita Haut 
>  wrote:
> 
> "...we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline..."
> 
> 
> +1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going 
> down this path as well. Having this natively supported by all OpenWhisk 
> runtimes can only make things easier.
> 
> 
> From: David P Grove 
> Sent: Tuesday, August 14, 2018 2:29:12 PM
> To: dev@openwhisk.apache.org
> Subject: logging baby step -- worth pursuing?
> 
> 
> 
> Even if we think structured logging is the right eventual goal, it could
> take a while to get there (especially since it is changing functionality
> users may have grown accustomed to).
> 
> However, for non-concurrent, non-blackbox runtimes we could make a small,
> not-user visible change, that could enable fully offline and streaming log
> processing.  We already generate an end-of-log sentinel to stdout/stderr
> for these runtimes.  If we also generated a start-of-log sentinel to
> stdout/stderr that included the activation id, we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline.
> 
> Is this worth pursuing?   I'm motivated to get log processing out of the
> Invoker/ContainerRouter so we can push ahead with some of the scheduler
> redesignwithout tackling logging, I don't think we'll be able to assess
> the true scalability potential of the new scheduling architectures.
> 
> --dave



Re: System env vars in user containers

2018-08-06 Thread Tyson Norris
So what are the invoker changes that will leverage these runtime changes? I’m 
not sure that context was part of the thread yet, sorry if it was.

> On Aug 6, 2018, at 10:52 AM, Carlos Santana  wrote:
> 
> To avoid scope creep.
> 
> I implemented what Vadim proposed to make the runtime a bit more flexible,
> and allow the invoker to pass more env variables, so it up to the invoker
> how much or little to pass.
> These changes do not require changes on invoker code, or user aciton code,
> they are backward compatible
> 
> Out of scope:
> 1. changing the structure of the body for /run
> 2. splitting on /init and /run are good things to consider.
> 3. enhancing runtimes to allow more action signatures with context object
> 
> Here are the proposed changes:
> *Update runtimes for Apache:*
> - Nodejs6, Nodejs8:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-nodejs%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=sR1qBnSxkKTPzz%2B4Ijr%2ByVyS9LaxVaZ1lCD%2BXJthjAw%3Dreserved=0
> - Docker Skeleton:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-docker%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=NMTxeBqQGUiU4O%2BWlxiTiFN6gMS1lumEdUhxCy1Jz6o%3Dreserved=0
> - Python2, Phython3:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-python%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=DEYgFy3VKXUawsSuBhYBllMwyDY8IoMAiziaVDXEblY%3Dreserved=0
> - Swift3, Swift4:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-swift%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=3CMhx2KpIDuqC9%2BAFx%2FSi8grfT5BVxhKIvucvsQFIyQ%3Dreserved=0
> - Java8
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-java%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=fYf81Arf23oLX15vQRfg97C8hXkvtXS8wFAHDCPcU2Q%3Dreserved=0
> - Ruby
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-ruby%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=HAcFWm2Oa5gV1GGwvpuW8ZnL1VYcWAc5Ubk96WXF3Pc%3Dreserved=0
> - Ballerina
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk-runtime-ballerina%2Fcompare%2Fmaster...csantanapr%3Aiam_flex_env_vars%3Fexpand%3D1data=02%7C01%7Ctnorris%40adobe.com%7Cac6ed4689003431a40c108d5fbc56205%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636691747561463947sdata=FmYWGorcluPQMr3iRFbzq9cG4Mx8V2phN3rh328lJ7c%3Dreserved=0
> 
> if I don't see a -1, I would submit PRs.
> 
> -- Carlos
> 
> 
> On Mon, Aug 6, 2018 at 12:17 PM Rodric Rabbah  wrote:
> 
>>> Ideally we need to remove most of our env vars from /run to /init, I
>> agree.
>>> But wouldn't it be a breaking change then?
>>> 
>> 
>> I don't think so - the first time a container is started, the values are
>> provided on init.
>> On run, the values that change would be provided (activation id, deadline).
>> 
>> 
>>> Coming back to my original question, I'm ok with leaving the env vars on
>>> the root level of the json object. I think Carlos already has PRs that
>>> could enable that functionality.
>>> 
>> 
>> Yes I think this is fine - would permit more vars to be passed on as env
>> vars as well in the future.
>> 
>> -r
>> 



Re: System env vars in user containers

2018-08-06 Thread Tyson Norris
Other than parsing JSON (and compatibility), does setting env vars instead of 
propagating a JSON object into the /run handler give any benefit?

Since any runtime that wants to support concurrency won’t leverage any (or at 
least most, except “action name”?) vars during /init, and won’t leverage env 
vars during /run, I would hesitate to add “more env vars”, but I’m not sure the 
use case or the PRs being mentioned. 

I would keep it simpler and:
- allow containers to continue using env vars as is for compatibility, but skip 
this processing if concurrency is enabled (I can check into this for nodejs 
container)
- containers that support concurrency will need to just pass the JSON object 
(e.g. the req body, sans "value") as the “context” to /run

This of course changes the sig from main(params) to main(params, context), but 
this would not impact compatibility afaik.
Alternatively, as Chetan mentions, we can merge these to a single arg with 
“special keys”. I don’t have a strong opinion on which of these is implemented, 
just that env vars won’t be usable in concurrency cases (just like other forms 
of “global vars”).

Thanks
Tyson 



> On Aug 6, 2018, at 9:17 AM, Rodric Rabbah  wrote:
> 
>> Ideally we need to remove most of our env vars from /run to /init, I agree.
>> But wouldn't it be a breaking change then?
>> 
> 
> I don't think so - the first time a container is started, the values are
> provided on init.
> On run, the values that change would be provided (activation id, deadline).
> 
> 
>> Coming back to my original question, I'm ok with leaving the env vars on
>> the root level of the json object. I think Carlos already has PRs that
>> could enable that functionality.
>> 
> 
> Yes I think this is fine - would permit more vars to be passed on as env
> vars as well in the future.
> 
> -r



Re: Proposal on a future architecture of OpenWhisk

2018-07-20 Thread Tyson Norris
On Logging, I think if you are considering enabling concurrent activation 
processing, you will encounter that the only approach to parsing logs to be 
associated with a specific activationId, is to force the log output to be 
structured, and always include the activationId with every log message. This 
requires a change at the action container layer, but the simpler thing to do is 
to encourage action containers to provide a structured logging context that 
action developers can (and must) use to generate logs. 

An example is nodejs container - for the time being, we are hijacking the 
stdout/stderr and injecting the activationId when any developer code writes to 
stdout/stderr (as console.log/console.error). This may not work as simply in 
all action containers, and isn’t great even in nodejs. 

I would rather encourage action containers to provide a logging context, where 
action devs use: log.info, log.debug, etc, and this logging context does the 
needful to assert some structure to the log format. In general, many (most?) 
languages have conventions (slf4xyz, et al) for this already, and while you 
lose “random writes to stdout”, I haven’t seen this be an actual problem. 

If you don’t deal with interleaved logs (typically because activations don’t 
run concurrently), this this is less of an issue, but regardless, writing log 
parsers is a solved problem that would still be good to offload to external 
(not in OW controller/invoker) systems (logstash, fluentd, splunk, etc). This 
obviously comes with a caveat that logs parsing will be delayed, but that is OK 
from my point of view, partly because most logs will never be viewed, and 
partly because the log ingest systems are mostly fast enough already to limit 
this delay to seconds or milliseconds.  

Thanks
Tyson
> On Jul 20, 2018, at 8:46 AM, David P Grove  wrote:
> 
> 
> Rethinking the architecture to more fully exploit the capabilities of the
> underlying container orchestration platforms is pretty exciting.  I think
> there are lots of interesting ideas to explore about how best to schedule
> the workload.
> 
> As brought out in the architecture proposal [1], although it is logically
> an orthogonal issue, improving the log processing for user containers is a
> key piece of this roadmap.  The initial experiences with the
> KubernetesContainerFactory indicate that post-facto log enrichment to add
> the activation id to each log line is a serious bottleneck.  It adds
> complexity to the system and measurably reduces system performance by
> delaying the re-use of action containers until the logs can be extracted
> and processing.
> 
> I believe what we really want is to be using an openwhisk-aware log driver
> that will dynamically inject the current activation id into every log line
> as soon as it is written.  Then the user container logs, already properly
> enriched when they are generated, can be fed directly into the platform
> logging system with no post-processing needed.
> 
> If the low-level container runtime is docker 17.09 or better, I think we
> could probably achieve this by writing a logging driver plugin [2] that
> extends docker's json logging driver.  For non-blackbox containers, I think
> we "just" need the /run method to update a shared location that is
> accessible to the logging driver plugin with the current activation id
> before it invokes the user code.  As log lines are produced, that location
> is read and the string with the activation id gets injected into the json
> formatted log line as it is produced.   For blackbox containers, we could
> have our dockerskeleton do the same thing, but the user would have to opt
> in somehow to the protocol if they were using their own action runner.
> Warning:  I haven't looked into how flushing works with these drivers, so
> I'm not sure that this really workswe need to make sure we don't enrich
> a log line with the wrong activation id because of delayed flushing.
> 
> If we're running on Kubernetes, we might decide that instead of using a
> logging driver plugin, to use a streaming sidecar container as shown in [3]
> and have the controller interact with the sidecar to update the current
> activation id (or have the sidecar read it from a shared memory location
> that is updated by /run to minimize the differences between deployment
> platforms).  I'm not sure this really works as well, since the sidecar
> might fall behind in processing the logs, so we might still need a
> handshake somewhere.
> 
> A third option would be to extend our current sentineled log design by also
> writing a "START_WHISK_ACTIVATION_LOG " line in the /run
> method before invoking the user code.  We'd still have to post-process the
> log files, but it could be decoupled from the critical path since the
> post-processor would have the activation id available to it in the log
> files (and thus would not need to handshake with the controller at all,
> thus we could offload all logging to a node-level 

Re: Concurrency in invoker http client

2018-07-14 Thread Tyson Norris
Right, got it.

Got an initial impl working here:
https://github.com/apache/incubator-openwhisk/pull/3812/files#diff-3fca67626fe5f92ce431902dd2318a10R170

So now in that PR, the entity is either consumed via Unmarshal().to[String], 
when within the allowed response size, or else via the truncated() function, 
when content length is too large.

Thanks
Tyson

On Jul 14, 2018, at 5:04 AM, Markus Thoemmes 
mailto:markus.thoem...@de.ibm.com>> wrote:

Hi Tyson,

I think that is (kinda) expected. The ByteStrings are coming through in chunks, 
usually in sizes like 4k or 8k depending on the underlying connection. You 
won't know beforehand how big these chunks are.

Have you tried with a big response (like 1M?). The chunk should not be 1M in 
size, in theory at least.

Cheers,
Markus




Re: Concurrency in invoker http client

2018-07-13 Thread Tyson Norris
BTW, trying prefixAndTail, this doesn’t work for me
e.g.
response.entity.dataBytes
  .prefixAndTail(1)
  .runWith(Sink.head)
  .map({
case (Seq(r), _) =>
  Right(ContainerResponse(response.status.intValue, r.utf8String, 
Some(contentLength.B, maxResponse)))
  })

Does not result with r.size==1, rather r.size == response.entity.size

I will try to fiddle with it some more(and ignore the tail)


> On Jul 13, 2018, at 5:29 PM, Tyson Norris  wrote:
> 
> Thanks
> 
> Currently HttpUtils does return (almost) the same response when it is 
> truncated.. so that the truncated version can later be included in the 
> ErrorResponse…
> 
> I would be OK with changing the error message to not include ANY of the 
> response, since either way it is an error message. 
> Then the error message would ONLY be:
> "The action produced a response that exceeded the allowed length: 
> ${length.toBytes} > ${maxLength.toBytes} bytes.”
> Instead of 
> "The action produced a response that exceeded the allowed length: 
> ${length.toBytes} > ${maxLength.toBytes} bytes.  The truncated response was: 
> $trunk"
> 
> WDYT?
> 
> 
>> On Jul 13, 2018, at 5:21 PM, Markus Thoemmes  
>> wrote:
>> 
>> Hi Tyson,
>> 
>> Chetan solved a similar problem in "inlineOrAttach" in 
>> "AttachmentSupport.scala". He did this by continuously running a stream 
>> through "prefixAndTail", where he'd pick just one element and then run the 
>> stream to check whether he already crossed the limit.
>> 
>> This case now is a lot more performance critical, so I propose we add a 
>> "prefixAndTailWeighted" stage, which runs similar to "prefixAndTail" but 
>> weighs the inputs by their siuze so you can define how much you actually 
>> want to consume in bytes. You can then run the "tail" stream to an ignore 
>> Sink. Further, I'd propose to implement a "quick path" (like there is on in 
>> the current HttpUtils), which checks the "Content-Length" field of the 
>> response and just consumes the whole stream into a string if it's safe to do 
>> so to only use this special method on the truncation path.
>> 
>> As a more general, existential question: Do we even need the truncation 
>> path? Could we just deny the response straight away if the user's action 
>> returns a bigger value than allowed?
>> 
>> Hope this helps
>> 
>> Cheers,
>> Markus
>> 
> 



Re: Concurrency in invoker http client

2018-07-13 Thread Tyson Norris
Thanks

Currently HttpUtils does return (almost) the same response when it is 
truncated.. so that the truncated version can later be included in the 
ErrorResponse…

I would be OK with changing the error message to not include ANY of the 
response, since either way it is an error message. 
Then the error message would ONLY be:
"The action produced a response that exceeded the allowed length: 
${length.toBytes} > ${maxLength.toBytes} bytes.”
Instead of 
"The action produced a response that exceeded the allowed length: 
${length.toBytes} > ${maxLength.toBytes} bytes.  The truncated response was: 
$trunk"

WDYT?
 

> On Jul 13, 2018, at 5:21 PM, Markus Thoemmes  
> wrote:
> 
> Hi Tyson,
> 
> Chetan solved a similar problem in "inlineOrAttach" in 
> "AttachmentSupport.scala". He did this by continuously running a stream 
> through "prefixAndTail", where he'd pick just one element and then run the 
> stream to check whether he already crossed the limit.
> 
> This case now is a lot more performance critical, so I propose we add a 
> "prefixAndTailWeighted" stage, which runs similar to "prefixAndTail" but 
> weighs the inputs by their siuze so you can define how much you actually want 
> to consume in bytes. You can then run the "tail" stream to an ignore Sink. 
> Further, I'd propose to implement a "quick path" (like there is on in the 
> current HttpUtils), which checks the "Content-Length" field of the response 
> and just consumes the whole stream into a string if it's safe to do so to 
> only use this special method on the truncation path.
> 
> As a more general, existential question: Do we even need the truncation path? 
> Could we just deny the response straight away if the user's action returns a 
> bigger value than allowed?
> 
> Hope this helps
> 
> Cheers,
> Markus
> 



Re: Concurrency in invoker http client

2018-07-13 Thread Tyson Norris
Getting back to this, I’m close, but not sure how to get akka http client to 
truncate a response?

There is response.entity.withSizeLimit() - but this throws an exception, not 
really what I want. 

I think what I want is a variant of the HttpEntity.Limitable GraphStage that 
just stops emitting characters, but I’m not sure how exactly to do this?

Thanks
Tyson

> On Jun 25, 2018, at 10:32 AM, Rodric Rabbah  wrote:
> 
> the retires are only on failure to establish a connection - no other
> retries should be happening iirc.
> 
> -r
> 
> On Mon, Jun 25, 2018 at 1:29 PM Tyson Norris 
> wrote:
> 
>> Thanks Markus - one other question:
>> 
>> Assuming retry is the current missing piece to using PoolingRestClient (or
>> akka http directly), I’m also wondering if “retry” is the proper approach
>> here?
>> It may be worthwhile to initiate a port connection (with its own
>> timeout/retry behavior) before the /init so that we can distinguish between
>> “container startup is slow” and “bad behavior in action container after
>> startup”?
>> 
>> Also, I’m wondering if there are cases where rampant retry causes
>> unintended side affects, etc - this could be worse with concurrency
>> enabled, but I don’t know if this should be considered a real problem.
>> 
>> FWIW We avoid this (http request to container that is not yet listening)
>> in mesos by not returning the container till the mesos health check passes
>> (which currently just check the port connection), so this would be a
>> similar setup at the invoker layer.
>> 
>> Thanks
>> Tyson
>> 
>>> On Jun 25, 2018, at 10:08 AM, Markus Thoemmes <
>> markus.thoem...@de.ibm.com> wrote:
>>> 
>>> Hi Tyson,
>>> 
>>> Ha, I was thinking about moving back to akka the other day. A few
>> comments:
>>> 
>>> 1. Travis build environments have 1.5 CPU cores which might explain the
>> "strange" behavior you get from the apache client? Maybe it adjusts its
>> thread pool based on the number of cores available?
>>> 2. To implement retries based on akka-http, have a look at what we used
>> to use for invoker communication:
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fcommit%2F31946029cad740a00c6e6f367637a1bcfea5dd18%23diff-5c6f165d3e8395b6fe915ef0d24e5d1f=02%7C01%7Ctnorris%40adobe.com%7Ca07af0966c50438a270908d5dabe4b3a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636655433199601572=ZHjbf0ukNQaEFq2i58f5hxMt3zRa3JCHdHR0MRAn8Uo%3D=0
>> (NewHttpUtils.scala to be precise).
>>> 3. I guess you're going to create a new PoolingRestClient per container?
>> I could imagine it is problematic if new containers come and go with a
>> global connection pool. Just something to be aware of.
>>> 
>>> Oh, another thing to be aware of: We **used** to have akka-http there
>> and never tried again after the revert. We're certainly on a much newer
>> version now but we had issues of indefinitly hanging connections when we
>> first implemented it. Running some high-load scenarios before pushing this
>> into master will be needed.
>>> 
>>> I don't want to put you off the task though, give it a shot, I'd love to
>> have this back :). Thanks for attacking!
>>> 
>>> Cheers,
>>> -m
>>> 
>> 
>> 



Re: Concurrency in invoker http client

2018-06-25 Thread Tyson Norris
Thanks Markus - one other question: 

Assuming retry is the current missing piece to using PoolingRestClient (or akka 
http directly), I’m also wondering if “retry” is the proper approach here?
It may be worthwhile to initiate a port connection (with its own timeout/retry 
behavior) before the /init so that we can distinguish between “container 
startup is slow” and “bad behavior in action container after startup”?

Also, I’m wondering if there are cases where rampant retry causes unintended 
side affects, etc - this could be worse with concurrency enabled, but I don’t 
know if this should be considered a real problem. 

FWIW We avoid this (http request to container that is not yet listening) in 
mesos by not returning the container till the mesos health check passes (which 
currently just check the port connection), so this would be a similar setup at 
the invoker layer.

Thanks
Tyson

> On Jun 25, 2018, at 10:08 AM, Markus Thoemmes  
> wrote:
> 
> Hi Tyson,
> 
> Ha, I was thinking about moving back to akka the other day. A few comments:
> 
> 1. Travis build environments have 1.5 CPU cores which might explain the 
> "strange" behavior you get from the apache client? Maybe it adjusts its 
> thread pool based on the number of cores available?
> 2. To implement retries based on akka-http, have a look at what we used to 
> use for invoker communication: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fcommit%2F31946029cad740a00c6e6f367637a1bcfea5dd18%23diff-5c6f165d3e8395b6fe915ef0d24e5d1f=02%7C01%7Ctnorris%40adobe.com%7Ca07af0966c50438a270908d5dabe4b3a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636655433199601572=ZHjbf0ukNQaEFq2i58f5hxMt3zRa3JCHdHR0MRAn8Uo%3D=0
>  (NewHttpUtils.scala to be precise).
> 3. I guess you're going to create a new PoolingRestClient per container? I 
> could imagine it is problematic if new containers come and go with a global 
> connection pool. Just something to be aware of.
> 
> Oh, another thing to be aware of: We **used** to have akka-http there and 
> never tried again after the revert. We're certainly on a much newer version 
> now but we had issues of indefinitly hanging connections when we first 
> implemented it. Running some high-load scenarios before pushing this into 
> master will be needed.
> 
> I don't want to put you off the task though, give it a shot, I'd love to have 
> this back :). Thanks for attacking!
> 
> Cheers,
> -m
> 



Concurrency in invoker http client

2018-06-25 Thread Tyson Norris
Hi -
As part of the support for concurrency in action containers, I have this PR 
open in the nodejs action container: 
https://github.com/apache/incubator-openwhisk-runtime-nodejs/pull/41

I’ve been having real trouble getting concurrency tests to operate properly in 
travis env.

The behavior appears something like the http client is only allowing 2 
concurrent requests - additional requests are processed later. This is only 
happening in travis env, unfortunately.

A couple of questions:
* does anyone have advice for the org.apache.http.client to make it behave in 
travis env? Using PoolingHttpClientConnectionManager for concurrency locally 
works swell - it only gives me grief in travis, and only by processing requests 
in “more serial fashion” than when I run it locally.
* at this point, I’m actively looking at replacing org.apache.http.client 
(which has been discussed in the past) with PoolingRestClient (wrapper for akka 
http), but I’m not sure how to best extend it to support retries - any pointers 
here?
* Related to PoolingRestClient: A non-retry use of PoolingRestClient works 
great locally with 100s of concurrent requests, but fails only in travis with 
"akka.stream.StreamTcpException: Tcp command 
[Connect(172.17.0.9:8080,None,List(),Some(10 seconds),true)] failed because of 
Connection refused” - I assume because the container takes longer to startup in 
travis, and requires retries?

Thanks for any tips!
Tyson


Tech Interchange meeting tomorrow

2018-06-05 Thread Tyson Norris
Hi All -
Reminder: the biweekly call is tomorrow 10 am GMT-5:00 (US Central Time), 
please send any agenda items to the list or myself.
Google cal link below with call details.

Dominic - do you want to continue your presentation from 2 weeks ago?

Current agenda to also include:
* Priti Desai -  to discuss enhancements to wskdeploy
* Vincent - to discuss Apache (source) release progress with focus on JDK 
considerations.

Thanks
Tyson


https://calendar.google.com/calendar/r/eventedit/copy/M3RmZG04YW12cXVib2xwYzFycmEzYmFicGNfMjAxODA1MjNUMTUwMDAwWiBhcGFjaGVvcGVud2hpc2tAbQ/dHlzb25ub3JyaXNAZ21haWwuY29t?scp=ALL=AKUaPmYgpqvJrIPapxAhgpPqCtZr3uZXPgEH0o4LhuRu2k3JFiZjkq4BxBHGsLWVpl8P0Sy6jYLel9AYjRmJ3OizyEB21gEcHA%3D%3D=true=xml


Re: New scheduling algorithm proposal.

2018-05-25 Thread Tyson Norris
not located in its
> local.
>
> But it will also introduce some synchronization issue among controllers
> and invokers or it needs segregation between resources based scheduling at
> controller and real invocation.
> In the earlier case, since controller will schedule activations based on
> resources status, it is required to synchronize them in realtime.
> Invokers can send requests to any remote containers, there will be
> mismatch in resource status between controllers and invokers.
>
> In the later case, controller should be able to send requests to any
> invokers then invoker will schedule the activations.
> In this case also, invokers need to synchronize their container status
> among them.
>
> Under the situation all invokers have same resources status, if two
> invokers received same action invocation requests, it's not easy to control
> the traffic among them, because they will schedule requests to same
> containers. And if we take similar approach with what you suggested, to
> send intent to use the containers first, it will introduce increasing
> latency overhead as more and more invokers joined the cluster.
> I couldn't find any good way to handle this yet. And this is why I
> proposed autonomous containerProxy to enable location free scheduling.
>
> Finally regarding SPI, yes you are correct, ContainerProxy is highly
> dependent on ContainerPool, I will update my PR as you guided.
>
> Thanks
> Regards
> Dominic.
>
>
> 2018-05-18 2:22 GMT+09:00 Tyson Norris <tnor...@adobe.com.invalid>:
>
>> Hi Dominic -
>>
>> I share similar concerns about an unbounded number of topics, despite
>> testing with 10k topics. I’m not sure a topic being considered active vs
>> inactive makes a difference from broker/consumer perspective? I think there
>> would minimally have to be some topic cleanup that happens, and I’m not
>> sure the impact of deleting topics in bulk will have on the system either.
>>
>> A couple of tangent notes related to container reuse to improve
>> performance:
>> - I’m putting together the concurrent activation PR[1] (to allow reuse of
>> an warm container for multiple activations concurrently); this can improve
>> throughput for those actions that can tolerate it (FYI per-action config is
>> not implemented yet). It suffers from similar inaccuracy of kafka message
>> ingestion at invoker “how many messages should I read”? But I think we can
>> tune this a bit by adding some intelligence to Invoker/MessageFeed like “if
>> I never see ContainerPool indicate it is busy, read more next time” - that
>> is, allow ContainerPool to backpressure MessageFeed based on ability to
>> consume, and not (as today) strictly on consume+process.
>>
>> - Another variant we are investigating is putting a ContainerPool into
>> Controller. This will prevent container reuse across controllers (bad!),
>> but will bypass kafka(good!). I think this will be plausible for actions
>> that support concurrency, and may be useful for anything that runs as
>> blocking to improve a few ms of latency, but I’m not sure of all the
>> ramifications yet.
>>
>>
>> Another (more far out) approach combines some of these is changing the
>> “scheduling” concept to be more resource reservation and garbage
>> collection. Specifically that the ContainerPool could be a combination of
>> self-managed resources AND remote managed resources. If no proper (warm)
>> container exists locally or remotely, a self-managed one is created, and
>> advertised. Other ContainerPool instances can leverage the remote resources
>> (containers). To pause or remove a container requires advertising intent to
>> change state, and giving clients time to veto. So there is some added
>> complication in the start/reserver/pause/rm container lifecycle, but the
>> case for reuse is maximized in best case scenario (concurrency tolerant
>> actions) and concurrency intolerant actions have a chance to leverage a
>> broader pool of containers (iff the ability to reserve a shared available
>> container is fast enough, compared to starting a new cold one). There is a
>> lot wrapped in there (how are resources advertised, what are the new states
>> of lifecycle, etc), so take this idea with a grain of salt.
>>
>>
>> Specific to your PR: do you need an SPI for ContainerProxy? Or can it
>> just be designated by the ContainerPool impl to use a specific
>> ContainerProxy variant? I think these are now and will continue to be tied
>> closely together, so would manage them as a single SPI.
>>
>> Thanks
>> Tyson
>>
>> [1] 
>> https://na01.safelinks.prote

Re: Notes & Video posted for 2018-05-23 web meeting

2018-05-24 Thread Tyson Norris
Hi - 
Dominic, if you are able to upload your presentation to the wiki, please let us 
know?

Thanks!
Tyson

> On May 23, 2018, at 12:13 PM, Matt Rutkowski  wrote:
> 
> 
> Thanks for hosting Priti!  Enjoyed seeing the agenda prepared on-screen 
> filled with lots of topics.
> 
> CWIKI:https://cwiki.apache.org/confluence/display/OPENWHISK/2018-05-23+OW+Tech+Interchance+-+Meeting+Notes
> YouTube: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2FcgictUeK-Vk=02%7C01%7Ctnorris%40adobe.com%7C80de4ab34c7141082eaa08d5c0e13cf7%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636626995985859917=urEsLuaotQkSz5iwjoNJF54mbV0FZ%2BTapW5JYfxYjbo%3D=0
> 
> NOTE: please update your calendars back to Zoom for the next call... I will 
> update the Google calendar and Wiki shortly.
> 
> Thanks to Dominic for presenting the first part of his sched. algo. proposal 
> which we plan to continue on the dev. list as well as on the next meeting.
> 
> Cheers!
> MR



Re: Video+Notes uploaded from today's OW Tech interchange

2018-04-27 Thread Tyson Norris
FYI, I attached my presentation (concurrent activations) to the wiki page here:
https://cwiki.apache.org/confluence/download/attachments/80452012/concurrent%20activations.pdf?version=1=1524869707107=v2

> On Apr 26, 2018, at 5:45 PM, Carlos Santana  wrote:
> 
> Thanks Michelle I added the link to the presentation in the wiki for the
> meeting
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2F2018-04-25%2BOW%2BTech%2BInterchange%2B-%2BMeeting%2BNotes=02%7C01%7Ctnorris%40adobe.com%7C64e504ad34b642ea722d08d5abd8478b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603867760739645=gIh8sWJ7woZirS4iL%2Bz%2BZn%2Bdm47NOZabS44Rwrc1Hgk%3D=0
> 
> -- Carlos
> 
> On Thu, Apr 26, 2018 at 5:47 PM Michele Sciabarra 
> wrote:
> 
>> I uploaded the presentation here.
>> 
>> 
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.slideshare.net%2FMicheleSciabarr%2Fopenwhisk-goswiftbinaries-runtime=02%7C01%7Ctnorris%40adobe.com%7C64e504ad34b642ea722d08d5abd8478b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603867760739645=xJCoK3MnAhcpT51GckG9BxNzfO0YI1VAjd3Xyqgat6I%3D=0
>> 
>> I am not sure where to put it in the wiki, so here you go the link for
>> some expert wikist to paste in the right place.
>> 
>> --
>>  Michele Sciabarra
>>  openwh...@sciabarra.com
>> 
>> On Wed, Apr 25, 2018, at 11:06 PM, Carlos Santana wrote:
>>> Great meeting today,  just finished watching
>>> Sorry I messed up my clock and attended late.
>>> 
>>> kudos to presenters Tony and Michelle, you guys came prepare raising the
>>> bar for all of us +1
>>> 
>>> Looking forward to more deep dive discussions, particularly runtime
>> changes
>>> and tooling.
>>> 
>>> Dragos !!! where is my "wsk api" on docker-compose :-) ?
>>> 
>>> 
>>> -- Carlos
>>> 
>>> 
>>> On Wed, Apr 25, 2018 at 2:50 PM Chetan Mehrotra <
>> chetan.mehro...@gmail.com>
>>> wrote:
>>> 
 Thanks for the notes Matt!.
 
 @Tyson and @Michele would be helpful to have the presentations added to
 wiki
 Chetan Mehrotra
 
 
 On Thu, Apr 26, 2018 at 12:02 AM, Matt Rutkowski <
>> mrutkow...@apache.org>
 wrote:
> Thanks James for hosting and to Andy Steed for "volunteering" to host
 May 9th meeting (as well as welcome for being a new contributor).
> 
> YouTube: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2FTlIMt90TvpI=02%7C01%7Ctnorris%40adobe.com%7C64e504ad34b642ea722d08d5abd8478b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603867760739645=7ej3AVMa%2BBg3t5GdSq9PmRr83Rmcups1XYS7woymoo4%3D=0
> CWiki:
 
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2F2018-04-25%2BOW%2BTech%2BInterchange%2B-%2BMeeting%2BNotes=02%7C01%7Ctnorris%40adobe.com%7C64e504ad34b642ea722d08d5abd8478b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636603867760739645=gIh8sWJ7woZirS4iL%2Bz%2BZn%2Bdm47NOZabS44Rwrc1Hgk%3D=0
> 
> Cheers!
> - MR
 
>> 



Re: UPCOMING TECH INTERCHANGE (April 25th): Agenda items?

2018-04-23 Thread Tyson Norris
I’m revisiting enablement of concurrent activation processing - I’d like to 
provide a short list of steps getting there, and ask for input on some nuances 
that this brings up, like how to enable this option “per action” or “per 
image". 

> On Apr 23, 2018, at 4:21 AM, James Thomas  wrote:
> 
> Hello all.
> 
> It's the bi-weekly tech interchange this week (April 25th). Can people post
> agenda items here or message me in the Slack group? Reminder to invite
> other whiskers to the call.
> 
> *Details:*
> What: Apache OpenWhisk "Tech. Interchange" (bi-weekly) Zoom Meeting
> When: April 25 @ 11:00am EDT, 8am PDT, 4pm GMT, 5pm CEST , 3pm UTC
> Where: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhisk=02%7C01%7Ctnorris%40adobe.com%7C86d6de16f7ae437b1bea08d5a90c67d8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636600793101996042=I2lpuAZ9gu5cR4MNxoaKLG9ISE%2BbTaMEVQ0D6gDb79o%3D=0
> 
> *Wearing of comedy hats on the call is encouraged...*
> 
> -- 
> Regards,
> James Thomas



Re: Transactionid in the ErrorResponse

2018-04-19 Thread Tyson Norris
I would prefer changing the schema from number to string. (And breaking any 
apps that use it) 

Until we have a better notion of release and versioning (including all API, db, 
and message schemas), I’m not too concerned about breaking API changes at this 
level (code is only useful for OW operators - so I have some doubts of whether 
it is actually used anywhere?). 

I don’t this this particular breaking change warrants a API version number 
change at this point in time, but I agree at some point in the future it will.  

Tyson
 

> On Apr 19, 2018, at 8:51 AM, Nick Mitchell  wrote:
> 
> this seems like a breaking API change. e.g. in nodejs `===` checks would
> break.
> 
> On Thu, Apr 19, 2018 at 11:37 AM, Rodric Rabbah  wrote:
> 
>> Should we also rename “code”?
>> 
>> I don’t see the value in using code: 0 and changing the schema should be
>> fine and better in the long run.
>> 
>> -r
>> 
>>> On Apr 19, 2018, at 11:31 AM, Christian Bickel 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm currently working on a PR which basically moves the transactionId
>> generation from the controller to the entrypoint of the system. This is the
>> nginx or a frontdoor above.
>>> One change in this PR is to change the format of the tid from a number
>> to a String.
>>> This works pretty well except one change, that could be seen by users.
>>> If there is an error in our system, we return an error response with a
>> short description and the tid. Until now the tid was a number, so the value
>> in the JSON has no quotes. With this change, the response message would
>> change, because the tid is a String.
>>> This means the response would change from
>>> ```
>>> {
>>> "error": "This is the description",
>>> "code": 123
>>> }
>>> ```
>>> to
>>> ```
>>> {
>>> "error": "This is the description",
>>> "code": "123"
>>> }
>>> ```.
>>> 
>>> Do you agree, that this change would be OK?
>>> An alternative would be to always return a 0 and add an additional field
>> with our new tid-format.
>>> 
>>> If there are no concerns, I'll go ahead and change the field from the
>> number to a String.
>>> 
>>> Greetings
>>> Christian Bickel
>> 



Re: Sending activation metadata to Kafka

2018-04-17 Thread Tyson Norris
I took a brief look at the PR - it looks like “the prior approach” of sending 
to couchdb is still enabled, is that correct?

If so, it may be worthwhile to make the reference impl store to couchdb, and 
remove the activation persistence from controller/invoker?

This would also imply that the “polling” that is controller would also need to 
be replaced?
Thanks
Tyson

On Apr 17, 2018, at 12:02 PM, Rodric Rabbah 
> wrote:

It would be useful to provide a reference implementation for consuming this 
data. Can you also capture the goals in an issue (I scanned the PR quickly but 
there’s no corresponding issue).

Further now I think there are multiple ways of recording/reporting some of the 
metrics (log markers which are largely silenced by default, kamon metrics, and 
now kafka). Is that right?

I think we’ll need to also document these and a guide for when to use which - I 
caution that we are proliferating multiple ways of doing similar things with no 
consistency or articulated long term vision.

-r

On Apr 17, 2018, at 12:01 PM, Vadim Raskin 
> wrote:

Hi Chetan,

Can you share some details on how this is currently being done with
CouchDB. Do we have any analytics view configured which computes these
numbers currently?

To my knowledge we don't have the views that are shared anywhere in open
repos.

May be we also include a basic default implementation out of the box
which collect aggregated stats using Kamon metrics already being used

Not a bad idea, I'll consider sharing the peace of code after finishing the
development (separate from the original PR), it might require some
post-processing to strip away the ibm specific parts, so it might bide some
time.

regards,
Vadim.



On Tue, Apr 17, 2018 at 8:29 AM Chetan Mehrotra 
>
wrote:

Hi Vadim,

This looks helpful to get better insight in runtime operational stats!

It has some advantages over to the prior approach (sending them to
CouchDB).

Can you share some details on how this is currently being done with
CouchDB. Do we have any analytics view configured which computes these
numbers currently?

Now it would be possible to simply connect a custom micro service
to Kafka and consume the activations in real-time.

May be we also include a basic default implementation out of the box
which collect aggregated stats using Kamon metrics already being used
Chetan Mehrotra


On Mon, Apr 16, 2018 at 9:51 PM, Vadim Raskin 
>
wrote:
Hi everyone,


I’ve just opened a PR that enables sending activation metadata to Kafka.
It has some advantages over to the prior approach (sending them to
CouchDB). Now it would be possible to simply connect a custom micro
service
to Kafka and consume the activations in real-time. Some of the use cases
it
might cover: activation metrics - collect the data and push them into a
custom time-series database; user activity audit; activation analytics -
potentially get some insights with KSQL.


At the moment I’ve created a new kafka topic called events, which will
include messages from Controllers and Invokers. It encompasses the
following data collected from a single activation:


concurrentActivations
throttledActivations
statusCode
initTime
waitTime
duration
kind


Probably some more metadata will affiliate this list soon.


Just wanted to give a short heads up here. The PR I mentioned:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fpull%2F3552=02%7C01%7Ctnorris%40adobe.com%7Cc908a394ad184dffaefe08d5a495c08c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636595885445950512=seMpqJ1pUTYUt2a4XDUcBPTleDNCjxruf3HSjx%2BhDk4%3D=0


Thank you,


Vadim.



  1   2   >