RE: Dangers of renaming and removing runtime kinds
Hello Dave, I absolutely agree that all adopters running Apache Openwhisk as a private or public production offering will or even should have their own runtimes manifest - like we do in IBM. At the same time, we are using the Apache Openwhisk test suite to run against our IBM version of the system. When action kinds change in this test suite ("java" to "java:8"), this requires some work on our side. I admit that's our problem. With my proposal to improve documentation, I wanted to make adopters aware of what runtime changes mean. Even if adopters have their own version of the runtimes manifest, I guess they start with a copy of the Apache Openwhisk default manifest. So when they set up their runtime manifest, they hopefully keep the new description to make maintainers of the file aware that removal of runtime kinds needs to be planned carefully. Mit freundlichen Grüßen / Regards, Sven Lange-Last Senior Software Engineer IBM Cloud Functions Apache OpenWhisk E-mail: sven.lange-l...@de.ibm.com Find me on: Schoenaicher Str. 220 Boeblingen, 71032 Germany IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "David P Grove" To: dev@openwhisk.apache.org Date: 2019/09/17 00:31 Subject:[EXTERNAL] Re: Dangers of renaming and removing runtime kinds "Sven Lange-Last" wrote on 09/16/2019 01:51:11 PM: > > I opened PR #4627 to improve documentation. Said PR also adds > "documentation" to the pre-defined Openwhisk runtime manifest files to > make developers aware that renaming or removing runtime kinds can cause > problems. > Hi Sven, This is useful to write down. It should be an item in a best practice guideline for operators of OpenWhisk deployments. I think the community assumption is that all downstream OpenWhisk operators are maintaining their own internal versions of runtimes.json precisely because they need absolute control over their set of supported runtimes. And because they don't actually use the default runtimes.json, they should be insulated and able to consume all schema-preserving upstream changes related to runtimes.json at their own pace. It is a good point that the community could have made it more obvious to downstream operators that there was a change they needed to consume carefully in PR#4390 by leaving behind a deprecated kind for some period of time. --dave
Re: Dangers of renaming and removing runtime kinds
"Sven Lange-Last" wrote on 09/16/2019 01:51:11 PM: > > I opened PR #4627 to improve documentation. Said PR also adds > "documentation" to the pre-defined Openwhisk runtime manifest files to > make developers aware that renaming or removing runtime kinds can cause > problems. > Hi Sven, This is useful to write down. It should be an item in a best practice guideline for operators of OpenWhisk deployments. I think the community assumption is that all downstream OpenWhisk operators are maintaining their own internal versions of runtimes.json precisely because they need absolute control over their set of supported runtimes. And because they don't actually use the default runtimes.json, they should be insulated and able to consume all schema-preserving upstream changes related to runtimes.json at their own pace. It is a good point that the community could have made it more obvious to downstream operators that there was a change they needed to consume carefully in PR#4390 by leaving behind a deprecated kind for some period of time. --dave
Re: Dangers of renaming and removing runtime kinds
I don't think there is actually a distinction between the two paths in deserialize(). The try path throws the exception inside docReader.read() whereas in the catch, the exception is deferred to the actual type check that occurs on lines 64-67. The exceptions should arguably be the same - I suspect we can eliminate the try/catch (caveat: it's been a while since I looked at that code carefully). The reason the deserializer is the way it is, and the order matters, is that the type of the record is not recorded in the document and so the deserializer relies on schema matches to deserializes a given document. An action and a trigger are similar in schema - if you eliminate the exec property from the former. Perhaps the db interface should address that too (i.e., record the type in the document since by default there is only one db for all assets). -r On Mon, Sep 16, 2019 at 1:51 PM Sven Lange-Last wrote: > Hello Openwhisk community members, > > recently, PR #4390 [1] renamed runtime kind "java" to "java:8". While a > change like this looks harmless at the first sight, it breaks all existing > actions of this kind. This may not be important for developers and > "occasional" usage of Openwhisk - but it affects production deployments. > Production deployments with existing actions require additional migration > steps when renaming or removing runtime kinds. > > I opened PR #4627 to improve documentation. Said PR also adds > "documentation" to the pre-defined Openwhisk runtime manifest files to > make developers aware that renaming or removing runtime kinds can cause > problems. > > There is another area that should be improved - but I need help to better > understand this area... > > > When trying to create an action with a kind that does not exist, a > reasonable error message is created: > > $ wsk action create kind-does-not-exist tests/dat/actions/hello.js --kind > nodejs:4 > error: Unable to create action 'kind-does-not-exist': The request content > was malformed: > kind 'nodejs:4' not in Set(dotnet:2.2, go:1.11, nodejs:10, > ballerina:0.990, ruby:2.5, nodejs:18, blackbox, swift:4.2, java:8, > sequence, nodejs:6, nodejs:12, python:3, python:2, php:7.3) (code > 33bfb55ce44d1dd0bc6e662c49ea9391) > > > When trying to display an action's metadata which has a kind that does not > exist, the resulting error message is not helpful at all: > > $ wsk action get kind-does-not-exist > error: Unable to get action 'kind-does-not-exist': Resource by this name > exists but is not in this collection. (code > 4761468230c344417fd61cdca5922e52) > > > * My conclusion from looking into controller log's and code is that > deserialization of the ExecMetaDataBase object fails with a > DeserializationException [3]. > * This exception fails the "try" block in StoreUtils.deserialize() leading > to a fall-back read in the "catch" block. This fall-back read seems to > return a WhiskTrigger instead of a WhiskActionMetaData so that a > DocumentTypeMismatchException is thrown [4]. > The resulting message can be found in controller logs: "document type > class org.apache.openwhisk.core.entity.WhiskTrigger did not match expected > type class org.apache.openwhisk.core.entity.WhiskActionMetaData.". > * As a result, getEntity() fails with the misleading error message > mentioned above and HTTP status code 409 (Conflict). > > Can somebody explain why [4] has a fall-back and which scenarios are > addressed by this? > > In our scenario, ExecMetaDataBase should probably throw an > UnknownRuntimeKindException and StoreUtils.deserialize() should not have a > fall-back for this exception. > > > [1] https://github.com/apache/openwhisk/pull/4390 > [2] https://github.com/apache/openwhisk/pull/4627 > [3] > > https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/entity/Exec.scala#L445-L450 > [4] > > https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/database/StoreUtils.scala#L58-L67 > [5] > > https://github.com/apache/openwhisk/blob/be1e3a19c02956c9be85023a0bb0ff399c21444d/core/controller/src/main/scala/org/apache/openwhisk/core/controller/ApiUtils.scala#L148-L150 > > > Mit freundlichen Grüßen / Regards, > > Sven Lange-Last > Senior Software Engineer > IBM Cloud Functions > Apache OpenWhisk > > > E-mail: sven.lange-l...@de.ibm.com > Find me on: > > > Schoenaicher Str. 220 > Boeblingen, 71032 > Germany > > > > > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz > Geschäftsführung: Dirk Wittkopp > Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > >
Dangers of renaming and removing runtime kinds
Hello Openwhisk community members, recently, PR #4390 [1] renamed runtime kind "java" to "java:8". While a change like this looks harmless at the first sight, it breaks all existing actions of this kind. This may not be important for developers and "occasional" usage of Openwhisk - but it affects production deployments. Production deployments with existing actions require additional migration steps when renaming or removing runtime kinds. I opened PR #4627 to improve documentation. Said PR also adds "documentation" to the pre-defined Openwhisk runtime manifest files to make developers aware that renaming or removing runtime kinds can cause problems. There is another area that should be improved - but I need help to better understand this area... When trying to create an action with a kind that does not exist, a reasonable error message is created: $ wsk action create kind-does-not-exist tests/dat/actions/hello.js --kind nodejs:4 error: Unable to create action 'kind-does-not-exist': The request content was malformed: kind 'nodejs:4' not in Set(dotnet:2.2, go:1.11, nodejs:10, ballerina:0.990, ruby:2.5, nodejs:18, blackbox, swift:4.2, java:8, sequence, nodejs:6, nodejs:12, python:3, python:2, php:7.3) (code 33bfb55ce44d1dd0bc6e662c49ea9391) When trying to display an action's metadata which has a kind that does not exist, the resulting error message is not helpful at all: $ wsk action get kind-does-not-exist error: Unable to get action 'kind-does-not-exist': Resource by this name exists but is not in this collection. (code 4761468230c344417fd61cdca5922e52) * My conclusion from looking into controller log's and code is that deserialization of the ExecMetaDataBase object fails with a DeserializationException [3]. * This exception fails the "try" block in StoreUtils.deserialize() leading to a fall-back read in the "catch" block. This fall-back read seems to return a WhiskTrigger instead of a WhiskActionMetaData so that a DocumentTypeMismatchException is thrown [4]. The resulting message can be found in controller logs: "document type class org.apache.openwhisk.core.entity.WhiskTrigger did not match expected type class org.apache.openwhisk.core.entity.WhiskActionMetaData.". * As a result, getEntity() fails with the misleading error message mentioned above and HTTP status code 409 (Conflict). Can somebody explain why [4] has a fall-back and which scenarios are addressed by this? In our scenario, ExecMetaDataBase should probably throw an UnknownRuntimeKindException and StoreUtils.deserialize() should not have a fall-back for this exception. [1] https://github.com/apache/openwhisk/pull/4390 [2] https://github.com/apache/openwhisk/pull/4627 [3] https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/entity/Exec.scala#L445-L450 [4] https://github.com/apache/openwhisk/blob/2036548e62dbf959d91c2328e86318bd7cfa656f/common/scala/src/main/scala/org/apache/openwhisk/core/database/StoreUtils.scala#L58-L67 [5] https://github.com/apache/openwhisk/blob/be1e3a19c02956c9be85023a0bb0ff399c21444d/core/controller/src/main/scala/org/apache/openwhisk/core/controller/ApiUtils.scala#L148-L150 Mit freundlichen Grüßen / Regards, Sven Lange-Last Senior Software Engineer IBM Cloud Functions Apache OpenWhisk E-mail: sven.lange-l...@de.ibm.com Find me on: Schoenaicher Str. 220 Boeblingen, 71032 Germany IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: Backpressure for slow activation storage in Invoker
On 9/16/19, 8:32 AM, "Chetan Mehrotra" wrote: Hi Tyson, > in case of logs NOT in db: when queue full, publish non-blocking to "completed-non-blocking" The approach I was thinking was to completely disable (configurable) support for persisting activation from Invoker and instead handle all such work via activation persister service. That sounds find. I thought there was a suggestion to try to optimize the storage path by only diverting to kafka in case the memory queue is full. I agree it is simpler to treat everything the same. Thanks Tyson
Re: Please submit topics for this week's (Wed. 18th) Tech. Interchange call!
Hi Matt - Please add: Dan McWeeney - present some prototype code related to execution design discussion. Thanks! Tyson On 9/16/19, 6:03 AM, "Matt Rutkowski" wrote: Hello Whiskers! Please submit items for agenda for this Wednesday’s (Sept 18) Tech Interchange call. Some topics I already have "penciled in" include: * Proposal for new Tech. Int. Meeting time(s) - Dom * JVM Pre-cache optimization work in Java runtime - Matt * OpenWhisk Tekton Pipeline update - Priti Looking forward! Matt Day-Time: Wednesday Sept 18, 11AM EDT (Eastern US), 5PM CEST (Central Europe), 3PM GMT, 11PM (Beijing) Zoom: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fmy%2Fasfopenwhisk&data=02%7C01%7Ctnorris%40adobe.com%7C0581fbf465ee4042932508d73aa63481%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637042357832458664&sdata=4TcS53JT312WbS%2FXGz7jMPv8yWz%2Be25zwsQq5Mb9qz0%3D&reserved=0
OpenWhisk Execution Design
Hi – Here is a more detailed document regarding execution design that I briefly discussed at last meeting. https://docs.google.com/document/d/1A8IyQ2Zjjl6WPc41DBWJa28bp7jEs46bvXVO_H77yBY/edit?usp=sharing Please review and comment. Dan McWeeney will provide a brief demo of some prototype code at this week’s meeting. Related: to provide a PR to core repo that includes experimental code, Dan submitted a PR to exclude a directory from code scanning. https://github.com/apache/openwhisk-utilities/pull/71 Thanks Tyson
Re: Backpressure for slow activation storage in Invoker
Hi Tyson, > in case of logs NOT in db: when queue full, publish non-blocking to > "completed-non-blocking" The approach I was thinking was to completely disable (configurable) support for persisting activation from Invoker and instead handle all such work via activation persister service. Supporting a queue full based approach is tricky as it would be hard to indicate which all activation in Kafka completed queue are due to queue being full as we store activation after active ack. Otherwise ContainerProxy has to first place item in queue and see if full then add some marker to activation being sent on "completed" queue to indicate its for overflow case Chetan Mehrotra On Fri, Sep 13, 2019 at 3:14 AM Tyson Norris wrote: > > I think this sounds good, but want to be clear I understand the consumers and > producers involved - is this summary correct? > > Controller: > * consumes "completed-" topic (as usual) > Invoker: > * in case of logs NOT in db: when queue full, publish non-blocking to > "completed-non-blocking" > *in case of logs in db: when queue full, publish all to "Activations" topic > OverflownActivationRecorderService (new service): > * in case of logs NOT in db: consumes "completed-*" topic(s) AND > "completed-non-blocking" topic > * in case of logs in db: consumes "Activations" topic > > Thanks! > Tyson > > On 9/11/19, 4:51 AM, "Chetan Mehrotra" wrote: > > As part of implementing this feature I came across support for topic > patterns in Kafka [1] [2]. It seems to allow listening to multiple > topics by same or a group of consumer. So after discussing with Sven > (thanks Sven!) I came up with following proposal > > With this I think we can go back to "Option B1 - Activations via > controller topic" and thus subscribe to "completed-.*" pattern. > > This would help by avoiding any extra load on Kafka as we consumer > same activation result messages as being sent to Controller. However > there are few caveats > > 1. Currently we send activation result via Kafka only for blocking calls > 2. Result send does not contain logs > > So we can possibly have support for 2 modes > > Option CB1 - Existing topic + new topic for non blocking result > --- > > This mode would be used if the setup does not record the logs in db. > In this mode we would add support in Invoker to also send result for > non blocking calls to a new "completed-non-blocking" topic and then > listen for "completed-.*" > > Option CB2 - New topic + KafkaActivationStore > -- > This mode can be used if setup stores logs in db. Here we would have a > new KafkaActivationStore which would send the activations to a new > "activations" topic > > The ActivationPersister service can support both modes and cluster > operator can configure it in required mode > > Chetan Mehrotra > [1] > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.akka.io%2Fdocs%2Falpakka-kafka%2Fcurrent%2Fsubscription.html%23topic-pattern&data=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272&sdata=pKognLhE6vFlE4k6ztn0%2BnYmnyVBi%2FFkD1NhN6PkkeI%3D&reserved=0 > [2] > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F11%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23subscribe-java.util.regex.Pattern-org.apache.kafka.clients.consumer.ConsumerRebalanceListener-&data=02%7C01%7Ctnorris%40adobe.com%7C9381bd5b8c0845ced67608d736ae5029%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637037994611727272&sdata=SJIKaxcjtscX9FUjkUWdVTFN3Y3mmJfwNQUCJOKnqNg%3D&reserved=0 > > On Mon, Jun 24, 2019 at 11:57 PM Chetan Mehrotra > wrote: > > > > > For B1, we can scale out the service as controllers are scaled out, > but it > > > would be much complex to manually assign topics. > > > > Yes thats what my concern was in B1. So would for now target B2 > > approach where we have a dedicated new topic and then have it consumed > > by a new service. If it poses problem down the line then we can go > > for B1. B > > > > Chetan Mehrotra > > > > On Tue, Jun 25, 2019 at 10:08 AM Dominic Kim > wrote: > > > > > > Let me share a few ideas on them. > > > > > > Regarding option B1, I think it can scale out better than option B2. > > > If I understood correctly, scaling out of the service will be highly > > > dependent on Kafka. > > > Since the number of consumers is limited to the number of partitions, > the > > > number of service nodes will be also limited to the number of > partitions. > > > > > > So in the case of B2, if we create a new topic with some partition > numbers, > > > we cannot scale out the service nodes more than that. > > > At some point, we may need to alter the number of pa
Please submit topics for this week's (Wed. 18th) Tech. Interchange call!
Hello Whiskers! Please submit items for agenda for this Wednesday’s (Sept 18) Tech Interchange call. Some topics I already have "penciled in" include: * Proposal for new Tech. Int. Meeting time(s) - Dom * JVM Pre-cache optimization work in Java runtime - Matt * OpenWhisk Tekton Pipeline update - Priti Looking forward! Matt Day-Time: Wednesday Sept 18, 11AM EDT (Eastern US), 5PM CEST (Central Europe), 3PM GMT, 11PM (Beijing) Zoom: https://zoom.us/my/asfopenwhisk
RE: [DISCUSS}: release "cli group" of projects
+1 Thank you Chetan. wskdeploy has had a few bug fixes and is due release... Kind regards, Matt From: Chetan Mehrotra To: dev@openwhisk.apache.org Date: 09/15/2019 11:23 PM Subject:[EXTERNAL] Re: [DISCUSS}: release "cli group" of projects +1 for version 1.0 for cli projects Chetan Mehrotra On Sat, Sep 14, 2019 at 5:43 AM Carlos Santana wrote: > > +1 and version 1.0 > > - Carlos Santana > @csantanapr > > > On Sep 13, 2019, at 10:46 AM, Rodric Rabbah wrote: > > > > +1 for for 1.0 > > > >> On Fri, Sep 13, 2019 at 10:23 AM David P Grove wrote: > >> > >> > >> > >> I'd like to make a release of the 3 "cli group" projects: > >> openwhisk-client-go, openwhisk-wskdeploy, openwhisk-cli. > >> > >> The main motivation is to pick up the fix for a bug [1] in wskdeploy, which > >> causes the `wsk project` subcommand to crash in some common usage scenarios > >> in the 0.10.0 release. > >> > >> It looks to me like the current master branch is stable with no pending PRs > >> that need to be merged. If I missed something, please comment on this > >> thread. > >> > >> One item for discussion is whether we should number this release as 0.11.0 > >> or go ahead and call it 1.0.0. To me it seems like the cli api is fairly > >> stable, so going to a 1.x.y numbering seems plausible. But I don't work on > >> the cli tools, so I might be overlooking a reason to stay with a 0.x > >> number. > >> > >> thanks, > >> > >> --dave > >> > >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_openwhisk-2Dwskdeploy_issues_1050&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=6zQLM7Gc0Sv1iwayKOKa4_SFxRIxS478q2gZlAJj4Zw&m=Pv7ochOdddqGbBq0sL3fWGQbIs511mcmTeSDL8ZdQ90&s=1AxIX5kCHYfJ5dAQPYEgVuBEOwjpR3OADEE-UPZmD7o&e= > >>
testing activation polling on/off
When invoking an action, the controller waits on a promise of the result to complete in one of two ways: active ack (response from the invoker) or from the activations database. The latter is protected by a deployment flag and may not be enabled. However our tests did not test for both cases: with database polling and without. I opened a PR to address this https://github.com/apache/openwhisk/pull/4623 As a side note, the PR also moves the time the controller waits before it terminates the HTTP response to a deployment configuration. This has the added benefit that some tests which took 1 minute each can now run with custom time limits (which I set to a few seconds). https://github.com/apache/openwhisk/pull/4623 -r