RE: Dangers of renaming and removing runtime kinds

2019-09-16 Thread Sven Lange-Last
Hello Dave,

I absolutely agree that all adopters running Apache Openwhisk as a private 
or public production offering will or even should have their own runtimes 
manifest - like we do in IBM.

At the same time, we are using the Apache Openwhisk test suite to run 
against our IBM version of the system. When action kinds change in this 
test suite ("java" to "java:8"), this requires some work on our side. I 
admit that's our problem.

With my proposal to improve documentation, I wanted to make adopters aware 
of what runtime changes mean. Even if adopters have their own version of 
the runtimes manifest, I guess they start with a copy of the Apache 
Openwhisk default manifest. So when they set up their runtime manifest, 
they hopefully keep the new description to make maintainers of the file 
aware that removal of runtime kinds needs to be planned carefully.



Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-l...@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294




From:   "David P Grove" 
To: dev@openwhisk.apache.org
Date:   2019/09/17 00:31
Subject:[EXTERNAL] Re:  Dangers of renaming and removing runtime 
kinds






"Sven Lange-Last"  wrote on 09/16/2019 
01:51:11
PM:
>
> I opened PR #4627 to improve documentation. Said PR also adds
> "documentation" to the pre-defined Openwhisk runtime manifest files to
> make developers aware that renaming or removing runtime kinds can cause
> problems.
>

Hi Sven,

 This is useful to write down.  It should be an item in a 
best
practice guideline for operators of OpenWhisk deployments.

 I think the community assumption is that all downstream 
OpenWhisk
operators are maintaining their own internal versions of runtimes.json
precisely because they need absolute control over their set of supported
runtimes.  And because they don't actually use the default runtimes.json,
they should be insulated and able to consume all schema-preserving 
upstream
changes related to runtimes.json at their own pace.

 It is a good point that the community could have made it 
more obvious
to downstream operators that there was a change they needed to consume
carefully in PR#4390 by leaving behind a deprecated kind for some period 
of
time.

--dave






Re: Dangers of renaming and removing runtime kinds

2019-09-16 Thread David P Grove



"Sven Lange-Last"  wrote on 09/16/2019 01:51:11
PM:
>
> I opened PR #4627 to improve documentation. Said PR also adds
> "documentation" to the pre-defined Openwhisk runtime manifest files to
> make developers aware that renaming or removing runtime kinds can cause
> problems.
>

Hi Sven,

This is useful to write down.  It should be an item in a best
practice guideline for operators of OpenWhisk deployments.

I think the community assumption is that all downstream OpenWhisk
operators are maintaining their own internal versions of runtimes.json
precisely because they need absolute control over their set of supported
runtimes.  And because they don't actually use the default runtimes.json,
they should be insulated and able to consume all schema-preserving upstream
changes related to runtimes.json at their own pace.

It is a good point that the community could have made it more obvious
to downstream operators that there was a change they needed to consume
carefully in PR#4390 by leaving behind a deprecated kind for some period of
time.

--dave


Re: Dangers of renaming and removing runtime kinds

2019-09-16 Thread Rodric Rabbah
I don't think there is actually a distinction between the two paths in
deserialize().  The try path throws the exception inside docReader.read()
whereas in the catch, the exception is deferred to the actual type check
that occurs on lines 64-67. The exceptions should arguably be the same - I
suspect we can eliminate the try/catch (caveat: it's been a while since I
looked at that code carefully).

The reason the deserializer is the way it is, and the order matters, is
that the type of the record is not recorded in the document and so the
deserializer relies on schema matches to deserializes a given document. An
action and a trigger are similar in schema - if you eliminate the exec
property from the former. Perhaps the db interface should address that too
(i.e., record the type in the document since by default there is only one
db for all assets).

-r

On Mon, Sep 16, 2019 at 1:51 PM Sven Lange-Last 
wrote:

> Hello Openwhisk community members,
>
> recently, PR #4390 [1] renamed runtime kind "java" to "java:8". While a
> change like this looks harmless at the first sight, it breaks all existing
> actions of this kind. This may not be important for developers and
> "occasional" usage of Openwhisk - but it affects production deployments.
> Production deployments with existing actions require additional migration
> steps when renaming or removing runtime kinds.
>
> I opened PR #4627 to improve documentation. Said PR also adds
> "documentation" to the pre-defined Openwhisk runtime manifest files to
> make developers aware that renaming or removing runtime kinds can cause
> problems.
>
> There is another area that should be improved - but I need help to better
> understand this area...
>
>
> When trying to create an action with a kind that does not exist, a
> reasonable error message is created:
>
> $ wsk action create kind-does-not-exist tests/dat/actions/hello.js --kind
> nodejs:4
> error: Unable to create action 'kind-does-not-exist': The request content
> was malformed:
> kind 'nodejs:4' not in Set(dotnet:2.2, go:1.11, nodejs:10,
> ballerina:0.990, ruby:2.5, nodejs:18, blackbox, swift:4.2, java:8,
> sequence, nodejs:6, nodejs:12, python:3, python:2, php:7.3) (code
> 33bfb55ce44d1dd0bc6e662c49ea9391)
>
>
> When trying to display an action's metadata which has a kind that does not
> exist, the resulting error message is not helpful at all:
>
> $ wsk action get kind-does-not-exist
> error: Unable to get action 'kind-does-not-exist': Resource by this name
> exists but is not in this collection. (code
> 4761468230c344417fd61cdca5922e52)
>
>
> * My conclusion from looking into controller log's and code is that
> deserialization of the ExecMetaDataBase object fails with a
> DeserializationException [3].
> * This exception fails the "try" block in StoreUtils.deserialize() leading
> to a fall-back read in the "catch" block. This fall-back read seems to
> return a WhiskTrigger instead of a WhiskActionMetaData so that a
> DocumentTypeMismatchException is thrown [4].
>   The resulting message can be found in controller logs: "document type
> class org.apache.openwhisk.core.entity.WhiskTrigger did not match expected
> type class org.apache.openwhisk.core.entity.WhiskActionMetaData.".
> * As a result, getEntity() fails with the misleading error message
> mentioned above and HTTP status code 409 (Conflict).
>
> Can somebody explain why [4] has a fall-back and which scenarios are
> addressed by this?
>
> In our scenario, ExecMetaDataBase should probably throw an
> UnknownRuntimeKindException and StoreUtils.deserialize() should not have a
> fall-back for this exception.
>
>
> [1] https://github.com/apache/openwhisk/pull/4390
> [2] https://github.com/apache/openwhisk/pull/4627
> [3]
>
> https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/entity/Exec.scala#L445-L450
> [4]
>
> https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/database/StoreUtils.scala#L58-L67
> [5]
>
> https://github.com/apache/openwhisk/blob/be1e3a19c02956c9be85023a0bb0ff399c21444d/core/controller/src/main/scala/org/apache/openwhisk/core/controller/ApiUtils.scala#L148-L150
>
>
> Mit freundlichen Grüßen / Regards,
>
> Sven Lange-Last
> Senior Software Engineer
> IBM Cloud Functions
> Apache OpenWhisk
>
>
> E-mail: sven.lange-l...@de.ibm.com
> Find me on:
>
>
> Schoenaicher Str. 220
> Boeblingen, 71032
> Germany
>
>
>
>
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,
> HRB 243294
>
>
>


Dangers of renaming and removing runtime kinds

2019-09-16 Thread Sven Lange-Last
Hello Openwhisk community members,

recently, PR #4390 [1] renamed runtime kind "java" to "java:8". While a 
change like this looks harmless at the first sight, it breaks all existing 
actions of this kind. This may not be important for developers and 
"occasional" usage of Openwhisk - but it affects production deployments. 
Production deployments with existing actions require additional migration 
steps when renaming or removing runtime kinds.

I opened PR #4627 to improve documentation. Said PR also adds 
"documentation" to the pre-defined Openwhisk runtime manifest files to 
make developers aware that renaming or removing runtime kinds can cause 
problems.

There is another area that should be improved - but I need help to better 
understand this area...


When trying to create an action with a kind that does not exist, a 
reasonable error message is created:

$ wsk action create kind-does-not-exist tests/dat/actions/hello.js --kind 
nodejs:4
error: Unable to create action 'kind-does-not-exist': The request content 
was malformed:
kind 'nodejs:4' not in Set(dotnet:2.2, go:1.11, nodejs:10, 
ballerina:0.990, ruby:2.5, nodejs:18, blackbox, swift:4.2, java:8, 
sequence, nodejs:6, nodejs:12, python:3, python:2, php:7.3) (code 
33bfb55ce44d1dd0bc6e662c49ea9391)


When trying to display an action's metadata which has a kind that does not 
exist, the resulting error message is not helpful at all:

$ wsk action get kind-does-not-exist
error: Unable to get action 'kind-does-not-exist': Resource by this name 
exists but is not in this collection. (code 
4761468230c344417fd61cdca5922e52)


* My conclusion from looking into controller log's and code is that 
deserialization of the ExecMetaDataBase object fails with a 
DeserializationException [3].
* This exception fails the "try" block in StoreUtils.deserialize() leading 
to a fall-back read in the "catch" block. This fall-back read seems to 
return a WhiskTrigger instead of a WhiskActionMetaData so that a 
DocumentTypeMismatchException is thrown [4].
  The resulting message can be found in controller logs: "document type 
class org.apache.openwhisk.core.entity.WhiskTrigger did not match expected 
type class org.apache.openwhisk.core.entity.WhiskActionMetaData.".
* As a result, getEntity() fails with the misleading error message 
mentioned above and HTTP status code 409 (Conflict).

Can somebody explain why [4] has a fall-back and which scenarios are 
addressed by this?

In our scenario, ExecMetaDataBase should probably throw an 
UnknownRuntimeKindException and StoreUtils.deserialize() should not have a 
fall-back for this exception.


[1] https://github.com/apache/openwhisk/pull/4390
[2] https://github.com/apache/openwhisk/pull/4627
[3] 
https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/entity/Exec.scala#L445-L450
[4] 
https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/database/StoreUtils.scala#L58-L67
[5] 
https://github.com/apache/openwhisk/blob/be1e3a19c02956c9be85023a0bb0ff399c21444d/core/controller/src/main/scala/org/apache/openwhisk/core/controller/ApiUtils.scala#L148-L150


Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-l...@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294




Re: Backpressure for slow activation storage in Invoker

2019-09-16 Thread Tyson Norris


On 9/16/19, 8:32 AM, "Chetan Mehrotra"  wrote:

Hi Tyson,

> in case of logs NOT in db: when queue full, publish non-blocking to 
"completed-non-blocking"

The approach I was thinking was to completely disable (configurable)
support for persisting activation from Invoker and instead handle all
such work via activation persister service.

That sounds find. I thought there was a suggestion to try to optimize the 
storage path by only diverting to kafka in case the memory queue is full. I 
agree it is simpler to treat everything the same.

Thanks
Tyson   





Re: Please submit topics for this week's (Wed. 18th) Tech. Interchange call!

2019-09-16 Thread Tyson Norris
Hi Matt - 
Please add: Dan McWeeney - present some prototype code related to execution 
design discussion. 

Thanks!
Tyson 

On 9/16/19, 6:03 AM, "Matt Rutkowski"  wrote:

Hello Whiskers!

Please submit items for agenda for this Wednesday’s (Sept 18) Tech 
Interchange call.

Some topics I already have "penciled in" include:

  * Proposal for new Tech. Int. Meeting time(s) - Dom
  * JVM Pre-cache optimization work in Java runtime - Matt
  * OpenWhisk Tekton Pipeline update - Priti

Looking forward!
Matt

Day-Time: Wednesday Sept 18, 11AM EDT (Eastern US), 5PM CEST (Central 
Europe), 3PM GMT, 11PM (Beijing)
Zoom: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhisk&data=02%7C01%7Ctnorris%40adobe.com%7C0581fbf465ee4042932508d73aa63481%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637042357832458664&sdata=4TcS53JT312WbS%2FXGz7jMPv8yWz%2Be25zwsQq5Mb9qz0%3D&reserved=0




OpenWhisk Execution Design

2019-09-16 Thread Tyson Norris
Hi –
Here is a more detailed document regarding execution design that I briefly 
discussed at last meeting.
https://docs.google.com/document/d/1A8IyQ2Zjjl6WPc41DBWJa28bp7jEs46bvXVO_H77yBY/edit?usp=sharing

Please review and comment. Dan McWeeney will provide a brief demo  of some 
prototype code at this week’s meeting.

Related: to provide a PR to core repo that includes experimental code, Dan 
submitted a PR to exclude a directory from code scanning.
https://github.com/apache/openwhisk-utilities/pull/71

Thanks
Tyson



Re: Backpressure for slow activation storage in Invoker

2019-09-16 Thread Chetan Mehrotra
Hi Tyson,

> in case of logs NOT in db: when queue full, publish non-blocking to 
> "completed-non-blocking"

The approach I was thinking was to completely disable (configurable)
support for persisting activation from Invoker and instead handle all
such work via activation persister service.

Supporting a queue full based approach is tricky as it would be hard
to indicate which all activation in Kafka completed queue are due to
queue being full as we store activation after active ack. Otherwise
ContainerProxy has to first place item in queue and see if full then
add some marker to activation being sent on "completed" queue to
indicate its for overflow case

Chetan Mehrotra

On Fri, Sep 13, 2019 at 3:14 AM Tyson Norris  wrote:
>
> I think this sounds good, but want to be clear I understand the consumers and 
> producers involved - is this summary correct?
>
> Controller:
> * consumes "completed-" topic (as usual)
> Invoker:
> * in case of logs NOT in db: when queue full, publish non-blocking to 
> "completed-non-blocking"
> *in case of logs in db: when queue full, publish all to "Activations" topic
> OverflownActivationRecorderService (new service):
> * in case of logs NOT in db: consumes "completed-*" topic(s) AND 
> "completed-non-blocking" topic
> * in case of logs in db: consumes "Activations" topic
>
> Thanks!
> Tyson
>
> On 9/11/19, 4:51 AM, "Chetan Mehrotra"  wrote:
>
> As part of implementing this feature I came across support for topic
> patterns in Kafka [1] [2]. It seems to allow listening to multiple
> topics by same or a group of consumer. So after discussing with Sven
> (thanks Sven!) I came up with following proposal
>
> With this I think we can go back to "Option B1 - Activations via
> controller topic" and thus subscribe to "completed-.*" pattern.
>
> This would help by avoiding any extra load on Kafka as we consumer
> same activation result messages as being sent to Controller. However
> there are few caveats
>
> 1. Currently we send activation result via Kafka only for blocking calls
> 2. Result send does not contain logs
>
> So we can possibly have support for 2 modes
>
> Option CB1 - Existing topic + new topic for non blocking result
> ---
>
> This mode would be used if the setup does not record the logs in db.
> In this mode we would add support in Invoker to also send result for
> non blocking calls to a new "completed-non-blocking" topic and then
> listen for "completed-.*"
>
> Option CB2 - New topic + KafkaActivationStore
> --
> This mode can be used if setup stores logs in db. Here we would have a
> new KafkaActivationStore which would send the activations to a new
> "activations" topic
>
> The ActivationPersister service can support both modes and cluster
> operator can configure it in required mode
>
> Chetan Mehrotra
> [1] 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.akka.io%2Fdocs%2Falpakka-kafka%2Fcurrent%2Fsubscription.html%23topic-pattern&data=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272&sdata=pKognLhE6vFlE4k6ztn0%2BnYmnyVBi%2FFkD1NhN6PkkeI%3D&reserved=0
> [2] 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F11%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23subscribe-java.util.regex.Pattern-org.apache.kafka.clients.consumer.ConsumerRebalanceListener-&data=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272&sdata=SJIKaxcjtscX9FUjkUWdVTFN3Y3mmJfwNQUCJOKnqNg%3D&reserved=0
>
> On Mon, Jun 24, 2019 at 11:57 PM Chetan Mehrotra
>  wrote:
> >
> > > For B1, we can scale out the service as controllers are scaled out, 
> but it
> > > would be much complex to manually assign topics.
> >
> > Yes thats what my concern was in B1. So would for now target B2
> > approach where we have a dedicated new topic and then have it consumed
> > by a new service.  If it poses problem down the line then we can go
> > for B1. B
> >
> > Chetan Mehrotra
> >
> > On Tue, Jun 25, 2019 at 10:08 AM Dominic Kim  
> wrote:
> > >
> > > Let me share a few ideas on them.
> > >
> > > Regarding option B1, I think it can scale out better than option B2.
> > > If I understood correctly, scaling out of the service will be highly
> > > dependent on Kafka.
> > > Since the number of consumers is limited to the number of partitions, 
> the
> > > number of service nodes will be also limited to the number of 
> partitions.
> > >
> > > So in the case of B2, if we create a new topic with some partition 
> numbers,
> > > we cannot scale out the service nodes more than that.
> > > At some point, we may need to alter the number of pa

Please submit topics for this week's (Wed. 18th) Tech. Interchange call!

2019-09-16 Thread Matt Rutkowski
Hello Whiskers!

Please submit items for agenda for this Wednesday’s (Sept 18) Tech Interchange 
call.

Some topics I already have "penciled in" include:

  * Proposal for new Tech. Int. Meeting time(s) - Dom
  * JVM Pre-cache optimization work in Java runtime - Matt
  * OpenWhisk Tekton Pipeline update - Priti

Looking forward!
Matt

Day-Time: Wednesday Sept 18, 11AM EDT (Eastern US), 5PM CEST (Central Europe), 
3PM GMT, 11PM (Beijing)
Zoom: https://zoom.us/my/asfopenwhisk


RE: [DISCUSS}: release "cli group" of projects

2019-09-16 Thread Matt Rutkowski
+1


Thank you Chetan.  wskdeploy has had a few bug fixes and is due release...

Kind regards,
Matt 




From:   Chetan Mehrotra 
To: dev@openwhisk.apache.org
Date:   09/15/2019 11:23 PM
Subject:[EXTERNAL] Re: [DISCUSS}: release "cli group" of projects



+1 for version 1.0 for cli projects
Chetan Mehrotra

On Sat, Sep 14, 2019 at 5:43 AM Carlos Santana  
wrote:
>
> +1  and version 1.0
>
> - Carlos Santana
> @csantanapr
>
> > On Sep 13, 2019, at 10:46 AM, Rodric Rabbah  wrote:
> >
> > +1 for for 1.0
> >
> >> On Fri, Sep 13, 2019 at 10:23 AM David P Grove  
wrote:
> >>
> >>
> >>
> >> I'd like to make a release of the 3 "cli group" projects:
> >> openwhisk-client-go, openwhisk-wskdeploy, openwhisk-cli.
> >>
> >> The main motivation is to pick up the fix for a bug [1] in wskdeploy, 
which
> >> causes the `wsk project` subcommand to crash in some common usage 
scenarios
> >> in the 0.10.0 release.
> >>
> >> It looks to me like the current master branch is stable with no 
pending PRs
> >> that need to be merged.  If I missed something, please comment on 
this
> >> thread.
> >>
> >> One item for discussion is whether we should number this release as 
0.11.0
> >> or go ahead and call it 1.0.0.   To me it seems like the cli api is 
fairly
> >> stable, so going to a 1.x.y numbering seems plausible.  But I don't 
work on
> >> the cli tools, so I might be overlooking a reason to stay with a 0.x
> >> number.
> >>
> >> thanks,
> >>
> >> --dave
> >>
> >> [1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_openwhisk-2Dwskdeploy_issues_1050&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=6zQLM7Gc0Sv1iwayKOKa4_SFxRIxS478q2gZlAJj4Zw&m=Pv7ochOdddqGbBq0sL3fWGQbIs511mcmTeSDL8ZdQ90&s=1AxIX5kCHYfJ5dAQPYEgVuBEOwjpR3OADEE-UPZmD7o&e=
 

> >>







testing activation polling on/off

2019-09-16 Thread Rodric Rabbah
When invoking an action, the controller waits on a promise of the result to
complete in one of two ways: active ack (response from the invoker) or from
the activations database. The latter is protected by a deployment flag and
may not be enabled. However our tests did not test for both cases: with
database polling and without.

I opened a PR to address this https://github.com/apache/openwhisk/pull/4623
As a side note, the PR also moves the time the controller waits before it
terminates the HTTP response to a deployment configuration. This has the
added benefit that some tests which took 1 minute each can now run with
custom time limits (which I set to a few seconds).

https://github.com/apache/openwhisk/pull/4623

-r